2025-11-27

TPUs vs. GPUs and why Google is positioned to win AI race in the long term

Whether Nvidia Can “Just Build TPUs”

Many argue nothing fundamentally stops Nvidia from making TPU‑like ASICs, but:
- The company is institutionally built around GPUs and CUDA; turning that “ship” is slow.
- Specializing too hard would risk cannibalizing very high‑margin data center GPUs.
Others counter that Nvidia already did this:
- Tensor Cores now deliver the vast majority of AI FLOPs on data‑center GPUs; graphics blocks are mostly gone there.
- Hopper/Blackwell are effectively AI accelerators with a GPU wrapper and CUDA compatibility.
Key architectural divide:
- TPUs use large systolic arrays, aggressively exploiting data locality and neighbor‑to‑neighbor communication.
- GPUs rely more on globally accessible memory and flexible kernels; CUDA and its ecosystem assume this model.
- Recreating TPU‑style arrays would mean sacrificing much of CUDA’s generality and legacy base.

Google’s TPU Economics & Vertical Integration

Google designs TPUs, runs the AI workloads, and operates the cloud – capturing chip margin and service margin.
TPUs are cheaper partly by avoiding Nvidia markup and partly from dropping “baggage” (graphics, broader general‑purpose support).
Supporters claim:
- Significantly better performance‑per‑dollar and per‑watt, especially for inference, gives Google a long‑term cost advantage.
- Even if an AI bubble pops, Google still uses TPUs internally; capex is funded from cash, not existential debt.
Skeptics note:
- If architectures shift (sparse, non‑matmul, exotic boolean or non‑attention models), highly specialized TPUs could become suboptimal.
- Because TPUs are only available via Google Cloud, lock‑in and ecosystem gaps remain real adoption barriers.

Scale, Interconnect, and Cluster Architecture

A major pro‑TPU argument is Google’s optical circuit switch (OCS):
- One Ironwood (TPU v7) cluster can connect 9,216 chips with ~1.77 PB HBM and enormous aggregate FLOPs.
- This far exceeds Nvidia’s current NVLink domain sizes on paper.
Pushback:
- Network topology matters: OCS + 3D torus vs fully switched NVLink fat‑trees have different strengths.
- Mixture‑of‑Experts and all‑to‑all workloads may favor Nvidia’s style of interconnect.
- Google doesn’t dominate MLPerf or visible training results, so the practical edge is unclear.

Training vs Inference and the CUDA Moat

Training:
- Rapidly changing research, many custom ops, mixed‑precision tricks, and heavy communication all favor CUDA’s flexibility and tooling.
- Most cutting‑edge research code is written for Nvidia first; others must “play catch‑up” by porting.
Inference:
- Workloads are more static; models are frozen and replicated; matrix‑multiply dominates.
- Several commenters think TPUs (and other ASICs) will win economically here as the market shifts from frontier training to massive, cheap inference.

Google’s Track Record, Trust, and Productization

Technical credibility is widely acknowledged: early ML leadership, TPUs since ~2013, strong infra and datacenter expertise.
But there’s deep concern about:
- Product instability (“killed by Google”), short attention span, and incentives favoring new launches over long‑term support.
- Data governance and privacy, especially for free/consumer offerings.
Some believe Google has surprisingly “turned the ship” with Gemini 3 and TPUs; others note:
- Gemini 3 is only one contender among many, not a clear runaway winner.
- Hardware advantage does not automatically translate into better models or UX; data curation, evals, and engineering still dominate.

Broader Competition & Who Ultimately “Wins”

Other specialized vendors (Groq, Cerebras, etc.) and in‑house chips from Meta, Tesla, Microsoft, Amazon, and OpenAI complicate any “Google vs Nvidia” narrative.
One camp expects:
- Nvidia to remain dominant via ecosystem, dev experience, and constant evolution (e.g., new low‑precision formats like FP4).
Another camp expects:
- When investor subsidy fades and inference dominates, total cost per useful token will decide winners – favoring vertically integrated players like Google.
Several commenters warn that if only giants can afford bespoke silicon, AI centralizes further and the rest of the ecosystem (including on‑prem and hobbyist use) loses.

Related topics