NVIDIA CES 2026: Six Chips, One Platform

15 min read

Jensen Huang just did it again.

At CES 2026, NVIDIA unveiled Vera Rubin—not as a roadmap slide, not as a “coming soon” tease, but as a platform in full production. Six new chips. A complete system architecture. And a number that changes everything: 10x more tokens at 1/10th the cost.

Named after the astronomer who discovered dark matter, Vera Rubin represents the most aggressive co-design effort in NVIDIA’s history. The Vera CPU and Rubin GPU weren’t designed separately and integrated later—they were architected as a single system from the silicon up. As Jensen put it from the stage: “Moore’s Law can’t keep up with 10x and exponential use of AI. We need to embrace extreme co-design.”

If you’ve been following my analysis of the memory wall and NVIDIA’s stack depth strategy, this announcement validates the thesis: the winners in AI infrastructure won’t be those with the best chips—they’ll be those who design complete systems where the boundaries between silicon, memory, network, and software dissolve entirely.

This isn’t incremental improvement. This is NVIDIA betting that the future of AI infrastructure requires rethinking everything—from transistor architecture to cooling systems to how we measure value creation.

The Demand Problem: Why Moore’s Law Isn’t Enough

Before diving into the hardware, Jensen grounded the audience in the math that’s driving NVIDIA’s roadmap:

Model size: 10x per year
Test-time scaling (reasoning tokens): 5x per year
Token cost reduction required: 10x cheaper per year

The reasoning models—the ones that “think” before responding—consume dramatically more tokens per query. When a model generates 50,000 internal reasoning tokens to produce a 500-token answer, inference costs explode. The only way to make these models economically viable is to collapse cost per token faster than usage grows.

Moore’s Law delivers roughly 2x improvement every two years. NVIDIA needs 10x every year. The gap can only be closed through architectural innovation—not process node shrinks. As I wrote in “The Packaging Paradox,” the real constraint isn’t transistor density anymore—it’s advanced packaging capacity and memory bandwidth. Vera Rubin addresses both.

Six New Chips: The Complete Vera Rubin Platform

NVIDIA didn’t announce one chip. They announced six—designed together as a unified system:

Vera CPU

88 Olympus cores with 176 threads via Spatial Multi-Threading—a new threading architecture that pushes semiconductor physics to extract more parallelism per core. 1.8 TB/s NVLink-C2C connects directly to the Rubin GPU with bidirectional, lower-latency communication than any previous generation. 1.5 TB system memory (3x Grace), 1.2 TB/s LPDDR5X bandwidth, and 227 billion transistors.

The key insight: Vera isn’t a standalone CPU—it’s the other half of a compute module. The CPU-GPU boundary that has constrained system design for decades is dissolving.

Rubin GPU

50 PFLOPS inference (5x Blackwell) via NVFP4 TensorCores—a new silicon invention that delivers 5x performance from just 1.6x more transistors. As Jensen emphasized: “This can’t be done in software. This is a new TensorCore architecture.”

35 PFLOPS training (3.5x Blackwell), 22 TB/s HBM4 bandwidth (2.8x), 3.6 TB/s NVLink (2x), and 336 billion transistors (1.6x). The efficiency gains come from co-design with Vera CPU and the new memory architecture, not just raw transistor count.

Remember the memory wall problem I outlined in “The Memory Wars”? The HBM4 bandwidth jump to 22 TB/s directly attacks that constraint. But bandwidth alone isn’t enough—you need the entire system designed around feeding data to compute units without stalls.

NVLink 6 Switch

3.6 TB/s per-GPU all-to-all bandwidth with 400G SerDes—the fastest serializer/deserializer in the world. Jensen’s framing: “This carries 2x the amount of world internet data.” A single NVL72 rack moves more data internally than the entire global internet.

In-network SHARP collectives enable collective operations to happen in the switch fabric itself, reducing CPU overhead and latency. 108 billion transistors.

Spectrum-X CPO: World’s First Photonic Ethernet Switch

This is the slide that made me sit up. TSMC COUPE silicon photonics—co-packaged optics integrated directly into the switch ASIC.

102.4 Tb/s scale-out bandwidth. 128 ports of 800 Gb/s. 512 ports of 200 Gb/s. 352 billion transistors.

Jensen held up the physical module—a teal PCB with integrated photonics visible on the edge. This is the first production silicon photonics switch at this scale. My work in Professor Fainman’s ultrafast nanoscale optics lab at UC San Diego focused on exactly this kind of integration—micro-ring resonators and silicon waveguides. Seeing it ship in a production networking product is extraordinary. (I covered the broader implications for interconnect companies in “The AI Datacenter Optical Interconnect Boom.”)

The implications for interconnect density and power efficiency are massive. Electrical signaling at these bandwidths requires enormous power for SerDes. Photonics fundamentally changes the equation.

ConnectX-9 SuperNIC

800 Gb/s Ethernet with 200G PAM4 SerDes, line-speed encryption, and CNSA/FIPS certification. 23 billion transistors. This is the network interface that connects Vera Rubin to the Spectrum-X fabric.

The encryption detail matters: every bus and NVLink connection is encrypted. This enables confidential computing in multi-tenant environments—critical for enterprise and government deployments.

BlueField-4 DPU: The KV Cache Platform

Here’s where the memory wall solution becomes concrete. BlueField-4 isn’t just a SmartNIC—it’s a distributed context memory platform.

800G networking + storage processor. 64-core Grace CPU. 2x networking, 6x compute, 3x memory bandwidth vs BlueField-3. 126 billion transistors.

The key capability: BlueField-4 runs NVIDIA Dynamo for distributed KV cache management. This adds +16TB of memory per rack for east-west traffic—memory that can store context across the entire cluster, not just within a single GPU’s HBM.

For long-context inference and agentic workloads where KV cache sizes explode, this architectural change matters more than raw GPU FLOPS. It’s the practical implementation of what I discussed in the Groq analysis—solving the memory bottleneck through system architecture, not just faster chips.

The Complete System: Vera Rubin NVL72

Jensen’s framing of the compute tray was striking: “No cables. No hoses. No fans.”

The Vera Rubin compute tray is a fully liquid-cooled module where the Vera CPU and Rubin GPU connect via NVLink-C2C on a single board. The chassis simplification from Blackwell to Vera Rubin mirrors what SpaceX did with Raptor engines—dramatic reduction in part count and failure modes while increasing performance.

Vera Rubin NVL72 Full Specs:

3.6 EFLOPS inference (5x Blackwell NVL72)
2.5 EFLOPS training (3.5x)
54 TB LPDDR5X system memory (3x)
20.7 TB HBM4 (1.5x)
1.6 PB/s HBM bandwidth (2.8x)
260 TB/s scale-up bandwidth (2x)
220 trillion transistors (1.7x)

A single rack. 72 GPUs. 220 trillion transistors. 2 tons of weight. 2 miles of copper cable in the NVLink spine. Hot water cooling at 45-50°C inlet—no chillers needed, PUE approaching 1.0.

Context Is the New Bottleneck

Jensen introduced a slide that crystallized the memory wall problem: “Context is the New Bottleneck.”

The memory hierarchy for AI inference now spans: HBM → System Memory → Rack SSD → Network SSD. And as Jensen said: “Storage must be rearchitected.”

The NVIDIA Context Memory Storage Platform addresses this directly:

POD-level context memory via BlueField-4 storage processors, Spectrum-X Ethernet RDMA for fabric-attached storage, delivering 5x higher tokens per second and 5x higher power efficiency for context-heavy workloads.

This is the infrastructure layer that makes million-token context windows and persistent agent memory economically viable.

The Number: 10x Throughput, 1/10th Cost

This is the slide that matters most for investors and infrastructure planners.

“Six New Chips — One Giant Leap to the Next Frontier”

Time to Train: 1/4th fewer GPUs (DeepSeek++ benchmark: 100T tokens in 1 month with ~32K GPUs vs 128K on Blackwell)

Factory Throughput: Up to 10x more tokens (Kimi K2-Thinking benchmark)

Token Cost: 1/10th lower cost

These aren’t projections. These are benchmarked numbers on production workloads. When Jensen says “factory throughput,” he’s explicitly positioning data centers as AI factories—facilities that convert electricity into tokens, measured by throughput and yield just like semiconductor fabs.

And then he delivered the line that should reshape how investors value AI infrastructure: “Throughput per watt translates to revenue.”

Power is the binding constraint for data centers. You can’t buy more power—utilities have limits, grid capacity is finite, permits take years. The question isn’t “how many FLOPS can I buy?” It’s “given my fixed power budget, how much revenue can I generate?”

10x throughput at 2x power = 5x revenue per watt.

NVIDIA isn’t selling chips. They’re selling revenue generation capacity.

Physical AI: The Real Endgame

While the hardware captured headlines, Jensen spent significant time on what NVIDIA is actually building toward: Physical AI—intelligence that interacts with the real world.

The full-stack platform: GB300 Training → Cosmos/GROOT/Alpamayo/Omniverse → Thor Inference → RTX Pro Simulation

Cosmos: The World Foundation Model

Cosmos is NVIDIA’s world foundation model—trained to understand physics, spatial relationships, and temporal dynamics. The key insight Jensen shared: “Cosmos turns compute into data.”

Rather than collecting billions of miles of real-world driving data, Cosmos can generate physically-accurate synthetic scenarios. The model takes wireframe inputs and generates photorealistic driving scenes with correct physics. NVIDIA is using Cosmos internally for their own development.

Alpamayo: The Reasoning VLA

NVIDIA Alpamayo is the first thinking, reasoning, action model for autonomous vehicles—a Vision-Language-Action model that doesn’t just perceive and react, but reasons about driving decisions.

The architecture: Multi-camera input + ego-motion → Driving decisions with causal reasoning → Trajectory output. This is the model that will power next-generation autonomous systems.

Mercedes-Benz CLA: Physical AI Ships Q1 2026

The 2025 Mercedes-Benz CLA ships with NVIDIA’s full autonomous stack: Alpamayo + Policy/Safety + Classical Stack + Halos Safety OS, all running on NVIDIA Thor.

Jensen emphasized: “Every line of code is safety certified.” The CLA is rated the world’s safest car, with NVIDIA software on NVIDIA silicon.

The Ecosystem Play: Open Models at the Frontier

Jensen opened with a framework for where AI is heading:

1. Compute is Data — Synthetic data generation via world models
2. AI Becomes Agentic — Multi-step reasoning and action
3. Physical AI Takes Leap — Intelligence in the real world
4. AI Learns Laws of Nature — Physics-aware models
5. Open Models Reach Frontier — Democratization of capability

The open model ecosystem is catching the frontier. 80% of AI startups now build on open models. 1-in-4 tokens on OpenRouter come from open models. NVIDIA’s bet: by leading the open model ecosystem, they ensure their hardware remains the substrate regardless of which model wins.

NVIDIA Leads Open Model Ecosystem: Clara (healthcare), Earth-2 (climate), Nemotron (language), Cosmos (world), GROOT (robotics), Alpamayo (autonomous). All open source or open weights. (I covered what Nemotron means for enterprise AI economics in detail.)

Jensen’s framing: “We build the entire stack, but it’s open to the ecosystem.”

Enterprise and Ecosystem Partners

Enterprise platforms adopting NVIDIA AI: Palantir, ServiceNow, Snowflake—the infrastructure layer of enterprise software is going NVIDIA-native.

Global L4 and Robotaxi Ecosystem: Aurora, Wayve, BYD, Mercedes, Uber, and dozens more. Physical AI isn’t a research project—it’s a deployment reality with a massive partner ecosystem.

EDA and Industrial Partners: Cadence, Synopsys, and Siemens are integrating NVIDIA’s full platform—from chip design to factory operations. Siemens is integrating CUDA and CUDA-X across their entire industrial software stack.

The robot parade at CES wasn’t a gimmick—it was a demonstration of Physical AI shipping across humanoids, industrial equipment, and autonomous systems.

So What? Investment Implications

Let me translate what this means for capital allocation:

For NVIDIA: The 10x/1/10th economics extend the runway dramatically. Any hyperscaler or enterprise that was planning infrastructure based on Blackwell pricing now needs to recalculate—Vera Rubin delivers either the same output at 1/10th the cost, or 10x the output at the same cost. This isn’t a refresh cycle; it’s a step-function that resets competitive dynamics.

For hyperscaler custom silicon (Google TPU, Amazon Trainium, Microsoft Maia): The window is narrowing. When NVIDIA addresses inference efficiency through system-level co-design—not just faster chips—the ROI on maintaining separate hardware stacks becomes questionable. Custom silicon makes sense for captive workloads, but the general-purpose market is consolidating around whoever solves the full-stack problem first.

For AMD: MI300X’s 192GB HBM3 was impressive. But if Vera Rubin combines equivalent memory capacity with dramatically higher system bandwidth and the Context Memory Platform, AMD needs a packaging and system response—not just a process node catch-up. ROCm improvements help, but NVIDIA is selling stack depth, not chips.

For memory companies (SK Hynix, Samsung, Micron): The 16-Hi HBM4 race I discussed in “The Memory Wars” just got more urgent. NVIDIA’s demand for 22 TB/s bandwidth per GPU means HBM4 capacity is the binding constraint. SK Hynix’s lead position becomes more valuable; Samsung and Micron face pressure to close the gap.

For interconnect companies: Spectrum-X CPO with TSMC COUPE changes the game. Companies without a silicon photonics roadmap—particularly traditional transceiver suppliers—face in-sourcing risk as NVIDIA vertically integrates. Lumentum’s CPO laser supply position becomes strategic; pure-play electrical interconnect faces existential questions. (More on this in “The AI Datacenter Optical Interconnect Boom.”)

For packaging (TSMC, ASE): The CoWoS constraint I outlined in “The Packaging Paradox” intensifies. Vera Rubin’s complexity—six chips co-designed, 220 trillion transistors per rack—requires advanced packaging at scale. TSMC’s AP6/AP8 expansion becomes critical infrastructure. This is the real bottleneck, not transistor density.

For enterprise software (ServiceNow, Palantir, Snowflake): The platform partnerships announced today signal where enterprise AI is heading. Companies building on NVIDIA-native infrastructure will have structural advantages in inference economics. The “AI wrapper” premium I discussed in the Nemotron analysis faces compression as inference costs collapse.

The Bottom Line

Vera Rubin isn’t a GPU upgrade. It’s a complete rearchitecting of AI infrastructure around a single insight: the boundaries between CPU, GPU, memory, network, and storage must dissolve.

The six chips announced today weren’t designed independently. They were co-designed as a system—each silicon decision informed by how it would interact with every other component. The Vera CPU exists to feed the Rubin GPU. The NVLink 6 switch exists to connect them at bandwidth that matches their appetite. BlueField-4 exists to extend memory beyond what HBM alone can provide. Spectrum-X CPO exists to scale beyond what electrical interconnects allow.

This is what Jensen means by “extreme co-design.” And it’s why competitors face a challenge that goes beyond matching specs—they need to match system-level integration that took years to develop.

The era of evaluating AI chips in isolation is over. The winners will be those who design complete systems. NVIDIA just showed what that looks like.

10x throughput. 1/10th cost. In production now.

What’s Next: The Co-Design Series

Vera Rubin validates what I’ve been building toward in my research: co-design is the new moat. Over the next few weeks, I’ll be publishing a three-part series that goes deeper into why integrated system design—not individual chip specs—determines who wins in AI infrastructure:

Part 1: The Memory Wall — Why Groq and Jamba had to find each other, and what the memory bandwidth constraint means for architecture choices.

Part 2: NVIDIA’s Inference Stack Depth Strategy — Mapping the $30B+ in acquisitions to infrastructure layers, and why Israel is NVIDIA’s secret weapon.

Part 3: The Verification Gap — The unsolved problem nobody’s talking about: who audits the agent swarm? Why verification infrastructure may matter more than silicon.

If you found this analysis valuable, please share it—it helps more than you know. And if you haven’t subscribed yet, now’s the time. BEP Research will be moving to paid soon, and early subscribers will be grandfathered in. I’m committed to delivering institutional-quality analysis on AI infrastructure that you won’t find anywhere else.

Subscribe to BEP Research →

Resources

About the Author

Ben Pouladian is a Los Angeles-based tech investor and entrepreneur focused on AI infrastructure, semiconductors, and the power systems enabling the next generation of compute. He was co-founder of Deco Lighting (2005–2019), where he helped build one of the leading commercial LED lighting manufacturers in North America. Ben holds an electrical engineering degree from UC San Diego, where he worked in Professor Fainman’s ultrafast nanoscale optics lab on silicon photonics and micro-ring resonators, and interned at Cymer, the company that manufactures the EUV light sources for ASML’s lithography systems.

He currently serves as Chairman of the Leadership Board at Terasaki Institute for Biomedical Innovation and is a YPO member. His investment research focuses on AI datacenter infrastructure, GPU computing, and the semiconductor supply chain. Long-term NVIDIA investor since 2016.

Follow on Twitter/X: @benitoz | More at benpouladian.com

Disclosure: The author holds positions in NVIDIA and related semiconductor investments. This is not investment advice.

Ben Pouladian's Blog

recent posts

NVIDIA CES 2026: Six Chips, One Platform

The Demand Problem: Why Moore’s Law Isn’t Enough