2026-04-22

Our eighth generation TPUs: two chips for the agentic era

Hardware capabilities and architecture

TPU 8t superpods: up to 9,600 chips, ~2 PB shared HBM, 121 exaFLOPs (FP4). Seen as a massive system, dwarfing top supercomputers if you ignore precision/programming differences.
TPU 8i: 288 GB HBM + 384 MB on‑chip SRAM per chip (though aggregating SRAM across chips is not that meaningful).
Cooling and density draw attention; images described as “sci‑fi/cyberpunk”.
Some discussion on DRAM/HBM cost and structure; HBM is more expensive due to stacking and interconnect area.

Training vs inference specialization

TPU 8t is training‑oriented; TPU 8i is for inference and post‑training.
Several see this split as part of a broader trend (e.g., other vendors developing inference‑focused chips).
Training viewed as compute‑bound, inference as more memory‑bound; specialization is seen as a way to gain efficiency.

Google’s strategic position vs competitors

Many argue Google’s vertical integration (chips, data centers, stack, distribution) is a long‑term advantage vs OpenAI/Anthropic, who must “rent” hardware and fight for market share.
Others push back, citing management “drift,” product missteps, and regulatory concerns over Google’s dominance in search, browser, and Android.
Comparisons to AWS/Azure: they also have custom silicon and deep Nvidia partnerships; some doubt Google can uniquely out‑optimize everyone.

Model quality and developer experience

Strong split in perceptions of Gemini:
- Some find Gemini excellent for everyday tasks, research, math/engineering, and long‑context code edits; praise token efficiency and multilingual ability.
- Others find it “second rate,” weak at coding/agents, prone to death loops and broken tool calls, with buggy CLI/IDE integrations and frequent timeouts.
General consensus that Google’s agentic/coding tooling (Gemini CLI, VS Code extension) lags behind competitors’ offerings, even if raw models are competitive.

Cost, efficiency, and economics

TPUs seen by some as significantly more FLOPs‑per‑dollar than Nvidia GPUs, giving Google a training and large‑scale inference cost edge.
Counterpoint: constrained TPU supply, TSMC capacity favoring Nvidia, and high GCP margins mean many researchers still choose cheaper GPU clouds.

AI demand, bubbles, and model commoditization

Several note overwhelming current demand and capacity shortages across providers, questioning “AI bubble collapse” narratives.
Others argue bubbles can pop even with long‑term demand (dot‑com, housing analogies).
Debate on whether large models will commoditize:
- One side expects persistent differentiation as long as new human content exists.
- Another expects open models and non‑US players to erode any one company’s frontier “moat.”

Product, stability, and deprecation policies

Complaints that Google aggressively deprecates Gemini models (1‑year windows), causing instability for workflows needing repeatability and cost predictability.
Some note that newer models can be more expensive (tokenization differences) and behave differently, forcing prompt and pipeline rework.
A few argue that if you need hard stability, self‑hosting is the only long‑term answer given hardware scarcity.

Related topics