"NVDA’s FLOPs per Chip Divided by Watt Consumption is anywhere from 2x-8x ahead of current-gen, custom ASICs."
That is the headline of slide 17 from Morgan Stanley Research’s NVIDIA preview note. The chart shows Vera Rubin on NVFP4 at the top of the page at 19.5 TFLOPs per watt. Amazon’s Trainium 3 sits at the bottom at 2.5. Google’s TPUv7 lands closer to the ASIC line than to Vera Rubin. The note frames it cleanly: raw silicon remains key in increasing performance per watt.
The Street is modeling the wrong line. The Goldman, Wells Fargo, and UBS previews this week all walked Data Center revenue, the $1 trillion cumulative Blackwell-plus-Rubin disclosure Jensen made at GTC, and gross margin into Rubin first launches. None of them led with the metric that decides which hyperscaler stack wins through 2028. That metric is cost per token at a megawatt-class rack, and on that metric NVFP4 is not an incremental improvement. It is a category-redefining move that inverts the entire "custom silicon is cheaper" procurement story hyperscaler CFOs have been told for three years.
This is the post. The reframe paid subscribers carry from here is The Cheap Chip Trap: a custom ASIC procured for its dollar-per-chip discount commits the buyer to a megawatt-per-token disadvantage for the operational life of the asset. The trap is not the chip price. It is the gigawatt that has to feed the chip for the next five years.
Reading The Chart
The Morgan Stanley chart is doing something specific most readers will miss. The Vera Rubin bar at 19.5 TFLOPs per watt is measured at NVFP4. Every other accelerator on the chart, including both custom ASIC bars, is measured at FP8. That is not a unit comparison. That is a workload comparison. The next generation of frontier inference is moving from FP8 to FP4. NVIDIA designed silicon for that workload. The current ASIC roadmaps did not.
Even on an apples-to-apples FP8 basis, Vera Rubin produces multiples more tokens per watt than Trainium 3, on the precision Trainium was designed for. Shift the workload to FP4 and the gap widens substantially further. TPUv7 fares better but still trails Vera Rubin at every precision. The Morgan Stanley footnote concedes the point: Trainium 4 and TPUv8 will also support FP4, both shipping in 2027 at earliest. The Vera Rubin gap is being marked today.
From The Watt Asymmetry: "The 50x perf-per-watt improvement from Hopper to Blackwell is the single cleanest expression of the Watt Asymmetry response. The full vertically integrated bundle (GPU plus CUDA plus NVLink Fusion plus Dynamo plus TensorRT-LLM plus DSX plus Omniverse plus Emerald AI) is what lets American data halls produce more tokens per watt than anyone else." The Morgan Stanley chart is the silicon-only slice of that answer. Add HBM4e bandwidth, NVLink Fusion at NVL576, Spectrum-X scale-out, and Dynamo serving on top, and the system-level cost-per-token gap is wider than the chip-level chart implies.
Why ASICs Chose Up Front, And Pay Forever
The custom silicon pitch to a hyperscaler CFO is simple. The wafer cost is yours, not NVIDIA’s gross margin. The design is tuned to one workload, not a fungible GPU. Per chip on a procurement spreadsheet, the ASIC wins. The procurement spreadsheet is the top of the iceberg. The line that decides cost structure sits below the waterline, denominated in megawatts.
The Chip Is Dead, Long Live The Factory put it this way: "A custom ASIC that beats NVIDIA on cost-per-GPU-hour but delivers a fraction of the tokens per megawatt is not a bargain. It is a factory that cost you less to build and costs you far more to run." The Morgan Stanley chart is the empirical mark on that claim. A Trainium 3 rack commits the same megawatt of grid, cooling, and 800V DC rectification as a Vera Rubin rack. The wafer was cheaper. The megawatt was the same. The tokens produced were a fraction.
Layer the second variable hyperscalers face: the US grid will not interconnect new megawatt load on commercial timelines. ERCOT has paused new interconnect studies. PJM’s queue runs into the 2030s. The megawatt you have today is the megawatt you have for the duration of your AI factory. Every token you do not produce from that megawatt is a token your competitor produces on Vera Rubin. That is not a procurement decision. That is a strategic capacity decision the spreadsheet has been pricing wrong.
I made the same point last week in Inference Never Sleeps: "Continuous inference at production scale is a sustained, high-utilization load, not a bursty one. Robot fleets, voice agents, video collaboration. Every deployed humanoid is a 24/7 inference duty cycle. Inference Never Sleeps, and the watt bill compounds." The procurement spreadsheet sized today’s bursty chatbot workload. The deployment fleet through 2028 runs 24/7, and the perf-per-watt gap compounds against the duty cycle.
Jensen confirmed the duty cycle on stage at Dell World this afternoon with Michael Dell: "We’re going to have billions of AI agents, and they’re going to be working 24/7." The principal said it out loud forty-eight hours before the print.
Below the paywall: NVFP4 as a system rather than a format, the cost-per-token math at one megawatt per rack through 2028, the four bear cases ranked by what I am watching into Wednesday’s print, the ticker map across NVDA, AVGO, MRVL, GOOGL, AMZN, and TSEM, and the five lines on the print that decide whether the thesis ratifies or refines.


Leave a Reply