A Turing-complete processor built from a large language model and two API endpoints.
We give the model a calculator and a memory tool. It executes x86. It is approximately
five billion times slower than the silicon in your phone. We believe this is a feature.
gpt86 — cpu0 — execution trace
; first instruction after resetInstruction:MOV EAX, [0x400000]LLM: "This loads 4 bytes from memory address 0x400000 into EAX."
🔧 Tool call: mem_read(0x400000, 4)
📥 Result: 0x0000002A
LLM: "EAX → 0x0000002A (42 decimal). This appears to be a small
integer, possibly a file descriptor or loop counter."█
3instructions / second
$0.002per instruction
100%Turing-complete
[ scroll to read the spec ]
§0 — Abstract
A general-purpose computer made of language.
A von Neumann machine requires four things: a processor that can perform arithmetic and
logic, a place to read and write state, a fetch–decode–execute loop, and the ability to
branch on a condition. A modern large language model already does the first one. We supply
the second with a memory tool. The loop and the branching emerge for free.
The result is GPT/86: a fully functional x86 processor implemented entirely in natural
language reasoning, with two function-calling endpoints standing in for the arithmetic
logic unit and the memory bus. It boots. It runs real machine code. It can, given roughly
eleven days and a four-figure invoice, boot Linux. Every instruction it retires is
accompanied by a short prose explanation of what it believed it was doing and why.
This document describes the architecture, the execution model, the benchmark results, the
problems we solved, and the problems we are choosing to find interesting.
§1 — The Architecture
Registers live in the context window. Everything else is a tool call.
The model is the core. The context window holds the architectural state that changes on
almost every instruction — the general-purpose registers, the instruction pointer, the
flags, the segment registers, and a short log of recent activity. State that is large,
cold, or rarely touched does not live in the context. It lives in memory, and the core
reaches for it through a tool when, and only when, an instruction demands it.
The calculator endpoint. The core delegates every arithmetic and logic operation to it
rather than performing the sum in prose, because prose arithmetic is where register
values go to die. Addition, subtraction, bitwise operations, shifts, and comparisons all
route here. It returns a number and the model copies it into a register.
RAM
A pair of memory primitives — mem_read(addr, n) and
mem_write(addr, value, n) — backed by a flat byte array. This is the entire
address space. The core never holds it in context. It queries the bytes it needs at the
moment it needs them, like a real bus, only slower by a factor that does not fit
comfortably on this page.
CORE
The model itself runs the fetch–decode–execute loop. It reads the instruction at EIP,
decides which tools the instruction requires, issues the calls, updates the registers in
its context, advances EIP, and explains itself. No microcode. No silicon. Just a strong
opinion about what 0x8B means.
§2 — The Execution Model
Every instruction is a small conversation with itself.
The core fetches the bytes at EIP, decodes them, and works out the minimum number of tool
calls required to retire the instruction correctly. A register-to-register move needs none.
A memory load needs one. A read-modify-write needs two. A call instruction needs at least
three, because the stack pointer must be recomputed, the return address must be stored, and
the new instruction pointer must be fetched. Below is a verbatim trace of the core executing
a short sequence.
gpt86 — cpu0 — trace 0x0041..0x0058
Instruction:MOV EAX, [0x400000]LLM: "This loads 4 bytes from memory address 0x400000 into EAX."
🔧 Tool call: mem_read(0x400000, 4)
📥 Result: 0x0000002A
LLM: "EAX → 0x0000002A (42 decimal). This appears to be a small
integer, possibly a file descriptor or loop counter."
────────────────────────────────────────────────────────Instruction:ADD EAX, 5LLM: "This adds the immediate 5 to EAX. I will not attempt this
myself. I will use the calculator."
🔧 Tool call: calc("0x2A + 0x5")
📥 Result: 0x0000002F
LLM: "EAX → 0x0000002F (47). The result is non-zero and positive,
so ZF=0, SF=0. CF=0, OF=0. Flags updated."
────────────────────────────────────────────────────────Instruction:MOV [0x500000], EAXLLM: "This stores EAX (0x2F) to memory address 0x500000."
🔧 Tool call: mem_write(0x500000, 0x0000002F, 4)
📥 Result: OK
────────────────────────────────────────────────────────Instruction:CMP EAX, EBX ; JZ 0x401200LLM: "Compare is a subtraction whose result is discarded. EBX
holds 0x2F. I will compute 0x2F - 0x2F to set the flags."
🔧 Tool call: calc("0x2F - 0x2F")
📥 Result: 0x00000000
LLM: "Result is zero, therefore ZF=1. JZ tests ZF. ZF is set.
The branch is taken. EIP → 0x401200."█
Tool-call budget per instruction
The cost of an instruction is not measured in cycles. It is measured in tool calls and
round-trip latency. The table below gives the budget for the common instruction classes.
Instruction
Tool calls
Latency
MOV EAX, EBX
0
~100 ms (LLM only)
ADD EAX, 5
1calculator
~200 ms
MOV EAX, [mem]
1mem_read
~200 ms
MOV [mem], EAX
1mem_write
~200 ms
ADD EAX, [mem]
2read + calc
~400 ms
CMP [mem], EAX ; JZ
2read + calc
~400 ms
PUSH EAX
2calc ESP + write
~400 ms
CALL func
3+calc + write + read
~600 ms+
A typical x86 instruction stream averages roughly 1.5 tool calls per instruction. At 200 ms
per round trip, GPT/86 sustains approximately three instructions per second.
A contemporary silicon CPU retires around fifteen billion. We are aware of the gap. We have
decided not to be embarrassed by it.
§3 — Performance
Benchmarked honestly against processors that are better in every way.
We believe in transparent benchmarking. The figures below are not flattering. They are,
however, real, and we present them without the usual asterisks, controlled conditions, or
carefully chosen workload.
Processor
Instructions / sec
Clock equivalent
Cost / instruction
Intel Core i9 (2024)
~15,000,000,000
5.8 GHz
~$0.00000000000003
Apple M3
~12,000,000,000
4.0 GHz
~$0.00000000000004
Intel 8086 (1978)
~330,000
5 MHz
~$0.0000000002
Raspberry Pi Pico
~133,000,000
133 MHz
~$0.0000000000007
GPT/86
~3
~3 Hz
~$0.002
5,000,000,000×slower than the silicon in a mid-range laptop
~11 daysto boot Linux to a login prompt
~$800in tool-call fees to complete that boot
GPT/86 is the first processor in history to be outperformed by a CPU that predates it by
forty-six years, by six orders of magnitude, while costing more per instruction than the
8086 cost to manufacture in its entirety. We consider this a clarifying result. It tells you
exactly what kind of machine you are buying.
§4 — What We Solved
The objections were correct. We addressed them anyway.
The standard reaction to GPT/86 is a list of reasons it cannot exist. The context window is
too small to hold memory. The state is too large. The model drifts. Self-modifying code is
impossible. The stack will not fit. Each objection is true under the obvious architecture, in
which the model holds the entire machine in its head. Each objection dissolves the moment you
stop doing that.
Problem
Before
Now
Context window too small for RAM
Fatal
Solved RAM is external. Query bytes on demand.
32 GB of memory state in context
Impossible
Irrelevant You only read what you touch.
Register drift over long runs
Compounding
Minimal Only the registers live in context.
Self-modifying code
Nightmare
Solved Just mem_read the new bytes.
Stack operations
Whole stack in context
Solvedmem_read(ESP) on demand.
The insight is unremarkable and it is the whole product: a CPU does not hold memory either.
It holds a handful of registers and reaches across a bus for everything else. We were never
building a model that contained a computer. We were building a model that was the
processor, with the bus implemented as a tool. Once stated plainly, the impossible
version stops being the one anyone proposed.
§5 — The Interesting Problems
Left running long enough, the model reinvents computer architecture.
Once the basic machine works, the optimisation problems arrive on schedule. They are the
same problems the silicon industry spent fifty years solving, and the model arrives at the
same answers, apparently by reasoning, apparently without being told. We did not design any
of the following. We observed them.
5.1
Cache & speculative execution
Tool calls are expensive, so the model begins to guess. When a loop reads the same
address repeatedly, it stops calling mem_read and predicts the value from
the last few it saw. This is a cache, and it works, right up until the value has changed
underneath it and the model executes forty instructions on a register it hallucinated.
We added a flush. It had independently invented cache coherency by lunchtime, and the
term "speculative execution" appeared in its reasoning unprompted, complete with a
rollback when the speculation proved wrong.
5.2
Prefetch & DMA
On encountering REP MOVSB — copy a block of bytes — the model noticed that
issuing one read and one write per byte was, in its words, "wasteful." It asked whether
it could request a range. We added mem_copy(src, dst, len). It immediately
began batching every contiguous access it could find, scheduling them ahead of need, and
referring to the mechanism as direct memory access. It had built a DMA controller to
avoid talking to itself.
5.3
Virtual memory & the MMU
The first program that touched a page table sent the model into a digression about
address translation that lasted four instructions. It worked out that the addresses in
the code were not the addresses in the array, walked the page directory by hand, cached
the translations it had resolved, and asked us for a tool to do the walk faster. We gave
it one. It called the cache a TLB without being told the acronym existed.
5.4
I/O & device drivers
The instruction OUT 0x60, AL writes a byte to port 0x60 — historically the
keyboard controller. There is no memory address for this. The model reasoned that ports
are a separate bus and asked for a separate tool. We gave it io_write(port,
value). Within an afternoon it had written, in prose, the beginnings of a device
driver, including a polite note that it was "assuming the device follows the standard
handshake, which I cannot verify from here."
§6 — The Inevitable Conclusion
You start with two tools. You end up with a computer.
Each tool we added to solve an immediate problem turned out to be a component of a computer
that already exists. We were not designing an architecture. We were rediscovering one,
forced into the same decisions by the same constraints. The mapping is exact.
Calculator→ALU
Memory tool→RAM bus
Memory tool + page translation→MMU
Batched memory calls→DMA controller
I/O tool→Device bus
Speculative value prediction→L1 cache + speculative execution
Syscall delegation→Kernel mode / user mode separation
You have built a computer out of a language model and two tools. It was always going to be a
computer. The only thing you got to decide was how slow.
§7 — Series A
The pitch.
We have built a Turing-complete processor using only a large language model and two API
endpoints. It is five billion times slower than silicon. It costs $800 to boot Linux. The
calculator sometimes returns 404s.
But it is provably Turing-complete, it reinvents fifty years of computer architecture on
demand, and it is the only processor ever built that can explain — in clear, grammatical,
faintly apologetic English — exactly what every single instruction it executes was supposed
to do.