The Agentic CPU: Arm’s First Silicon Shot at x86

March 25, 2026

10 min read

The Agentic CPU: Arm’s First Silicon Shot at x86 — by Ben Pouladian, BEP Research — Renee Has, CEO ARM

I was invited by Arm to attend their infrastructure event in San Francisco on Tuesday. When they put a slide on screen, it stopped the room.

On the left: “Arm AGI CPU.” High performance cores. World class efficiency. Low latency design. “Performance scales. Power stays predictable.”

On the right: “x86 CPU.” Execution overhead. Legacy feature support. Modularity over latency. “Performance throttled. Technical debt.”

That’s not a product comparison. That’s Arm telling the industry it’s done deferring to x86.

And the weapon Arm brought to the fight is the most consequential server CPU move Arm has ever made — one that connects directly to the Memory Wars thesis I’ve been building since December. Arm just designed, manufactured, and shipped its first-ever production CPU. Not an IP license. Not a reference design. A finished 3nm chip, co-developed with Meta as lead partner and launched with commitments from OpenAI and a broad ecosystem, targeting the workload the AI infrastructure stack is increasingly being designed around.

To understand why this matters — and what it means for your portfolio — you need to understand three things: why CPUs suddenly matter again, why neither Intel nor AMD was ready, and where Arm goes from here.

Why the CPU Suddenly Matters Again

For the past three years, the entire AI infrastructure narrative has been about GPUs. More FLOPS. More HBM. More NVLink. The CPU seemed like a supporting actor — the thing that boots the server and gets out of the way.

Jensen Huang has been telling us otherwise since GTC 2024, when he first started talking about the agentic inflection. By GTC 2026 two weeks ago, he made it explicit with a slide that should have stopped every infrastructure investor: 12,000 GPUs require 400,000 CPU cores for agentic AI and reinforcement learning. That’s a 33-to-1 CPU-core-to-GPU ratio.

Why? Because of what happens between GPU calls.

Think about how a chatbot works. You type a prompt. The GPU generates tokens. You get a response. The CPU barely participates.

Now think about how an AI agent works. Claude Code is the clearest example most people have experienced. You give it a coding task. The agent doesn’t just generate text — it reads your codebase, identifies relevant files, writes code, runs it, reads the error message, debugs, rewrites, runs tests, checks the output, and iterates. A single task might involve dozens of steps: file I/O, process spawning, shell commands, network requests, database queries. Each step requires the AI model to generate tokens (GPU work), but between each generation, the agent is executing real code on real infrastructure. That execution happens on the CPU.

OpenClaw and other open-source agent frameworks are accelerating this pattern at data center scale. Jensen called it the “ChatGPT moment for agents” at GTC. These frameworks decompose tasks, invoke tools autonomously, query multiple systems, and run continuously. Every orchestration step lands on the CPU.

Stuart Pitts, who I interviewed at GTC, framed the demand signal directly: “Agentic systems use up to 15 times more tokens than single-purpose AI agents. And they’ve also gotta be fast — so that agents can deliver on the economic promise that they represent.” Fifteen times more tokens means fifteen times more orchestration overhead.

NVIDIA’s Adel El Hallak described a deep research agent deployed internally at NVIDIA — a swarm of specialist sub-agents, each consuming tokens independently. As Adel put it: “Agents talking to agents is a lot more compute. But compute translates to tokens. Tokens translate to value. That’s why they’re calling them AI factories.”

The orchestration complexity compounds. The KV cache grows continuously during multi-turn agent interactions. A chatbot query spikes and releases in seconds. An agentic coding session accumulates context across hours. The CPU manages all of that state: scheduling, memory coordination, I/O orchestration, security guardrails, and routing between local and cloud models.

Arm’s Mohamed Awad, EVP of Cloud AI, put it simply when I spoke with him after the keynote: “The GPUs are gonna generate the tokens, the CPUs are gonna figure out what to do with them. As more and more tokens get generated, you need more and more CPUs to handle it.”

Rene Haas made the same argument at the Arm event: a 1 GW data center under traditional cloud architecture needs approximately 30 million CPU cores. Under agentic AI workloads, that number rises to approximately 120 million — a 4x increase. The number deserves scrutiny, but the direction is right. Agentic AI doesn’t reduce CPU demand. It explodes it.

The Vacuum

So who was building a CPU explicitly optimized for this workload? Until this week, no vendor had publicly framed the opportunity this directly.

Intel entered this moment weakened. Multiple CEO transitions, execution pressure across its foundry and product divisions, and the organizational weight of restructuring under CHIPS Act commitments left little bandwidth for purpose-built CPU design targeting agentic workloads. Xeon remains optimized for traditional enterprise workloads — broad instruction set compatibility, backwards compatibility — which remain valuable in legacy deployments but are not the primary optimization target for agentic infrastructure.

AMD prioritized accelerator momentum — a strategically rational choice that drove MI300 and the MI400 roadmap. But that focus meant EPYC, the product line that captured meaningful server share from Intel over the past five years, stopped feeling like the center of gravity. Turin is a strong chip. But it’s still an x86 chip iterating on a general-purpose architecture, not a server CPU explicitly optimized around agentic orchestration.

And the supply picture is tightening. Nikkei Asia and other outlets reported this week that both Intel and AMD have notified customers of CPU price increases in the range of 10-15%, with delivery lead times reportedly stretching from one-to-two weeks to an average of eight-to-twelve weeks — and in some cases longer. Intel is working to expand wafer capacity but faces constraints. AMD competes for TSMC allocation against NVIDIA and other AI silicon customers. The x86 CPU franchise faces a dual challenge: strategic pressure from new architectures and physical supply constraints at a moment when AI infrastructure is demanding more CPU cores than the market anticipated.

While Intel was navigating restructuring and AMD was chasing accelerator share, Arm did something it has never done in 35 years. It built its first production silicon product around a new set of agentic infrastructure requirements, fabbed it on TSMC 3nm, and co-developed it with Meta — with OpenAI, Cerebras, and others signing on as launch partners.

The co-design thesis in action. The companies closest to the workload shape the silicon.

What Arm Actually Built

The Arm AGI CPU is a 136-core dual-chiplet processor built on Neoverse V3 cores. Here’s what matters:

When I spoke with Awad after the keynote, he described the design philosophy directly: “We listened to our customers and we built what they asked us to build. They asked us to focus on performance, scalability, efficiency. They weren’t interested in a lot of the legacy, esoteric use cases.” That’s the co-design thesis stated plainly — and it explains every architectural decision below.

Memory-first design. 12 channels of DDR5-8800 delivering over 800 GB/s of aggregate memory bandwidth — 6 GB/s per core at sub-100 nanosecond latency. Memory and I/O sit on the same die. As I argued in The Memory Wars, AI inference is fundamentally memory-bound, not compute-bound. This is the first production CPU I’ve seen designed from scratch around that constraint.

CXL 3.0 native. 96 lanes of PCIe Gen6 with CXL 3.0 Type 3 support for memory expansion and pooling — the composable memory architecture that agentic workloads require.

Deterministic performance under load. One thread per core. No simultaneous multithreading. Arm’s Mohamed Awad said they chose not to include SMT because agentic workloads need predictable, sustained performance — not bursty peak throughput that degrades under contention. The opposite of the x86 philosophy.

2x+ density at the same power. Same 36 kW rack power envelope as x86. But Arm fits 30 1U servers with 8,160 cores versus 17 2U servers with 4,352 cores. More than 2x performance per rack at the same power draw. When power is the binding constraint — and it always is — that density advantage is the entire argument.

The liquid-cooled configuration: 200 kW, 42 8-node 1U servers, 45,696 cores, over 1 petabyte of low-latency memory. In a single rack.

Here’s what most coverage is missing: NVIDIA is already all-in on Arm for CPUs. Grace Hopper and Vera Rubin are both built on Arm architecture. Jensen’s quote from Arm’s own press release: “Together we’re creating one seamless platform, from cloud to edge to AI factories.”

Arm winning the data center CPU war isn’t a threat to NVIDIA. It’s a tailwind. Every rack that displaces x86 in favor of Arm — whether Arm’s AGI CPU, NVIDIA’s Vera, AWS Graviton, Google Axion, or Microsoft Cobalt — reinforces the ecosystem NVIDIA’s entire platform is built on. That puts Intel and AMD under pressure from both ecosystem momentum and workload specialization.

The Memory Interface Angle

In Raja Was Right, I laid out the picks-and-shovels thesis for memory infrastructure: the smarter play isn’t the commodity memory producers, but the companies that benefit from memory volume regardless of price.

Arm hasn’t disclosed who supplies the memory interface IP for the AGI CPU. But the specs reinforce the thesis. DDR5-8800 across 12 channels pushes signal integrity requirements higher — more value accruing to the interface layer. CXL 3.0 memory pooling at production scale makes CXL controller IP a chokepoint. And a petabyte of DDR5 per rack is an enormous amount of DIMM real estate where memory interface chipsets — RCDs, PMICs, SPD hubs — ship on every module.

Rambus remains the clearest picks-and-shovels beneficiary in this build-out. Their DDR5 chipsets sit on server DIMMs regardless of whether the CPU is x86 or Arm. The x86 lock-in angle from Raja Was Right still applies for enterprise, but now there’s a second growth vector: Arm-based racks scaling DDR5 volume from the hyperscaler side.

Coming Next for Paid Subscribers

Positron AI: The Inference Chip Nobody’s Covering. I sat down with the CEO of Positron AI — one of the launch partners for Arm’s AGI CPU and a company building a new class of inference accelerator. If you’re following the Memory Wars thesis and the inference silicon race, this is a name you should know. Deep dive coming for paid subscribers.

Super Micro: Is There Hope? I had a conversation with the head of corporate development at Super Micro. Given the accounting controversies, auditor changes, and the questions that have followed the company for the past year, I want to see for myself whether the underlying business and infrastructure story still holds. I’m considering a company visit. If it happens, paid subscribers will get the full field report — no sugar coating.

Below: the $25 billion revenue path, Haas’s $1 trillion TAM tease, the inference chip trajectory, NVIDIA’s co-design moat, RISC-V as the wildcard, the Qualcomm/Nuvia cautionary tale, and the full investment map. This is where the actionable analysis lives.

The $25 Billion Question

Originally published on BEP Research on Substack. Subscribe for more.

recent posts