Our eighth generation TPUs: two chips for the agentic era

Hardware capabilities and architecture

  • TPU 8t superpods: up to 9,600 chips, ~2 PB shared HBM, 121 exaFLOPs (FP4). Seen as a massive system, dwarfing top supercomputers if you ignore precision/programming differences.
  • TPU 8i: 288 GB HBM + 384 MB on‑chip SRAM per chip (though aggregating SRAM across chips is not that meaningful).
  • Cooling and density draw attention; images described as “sci‑fi/cyberpunk”.
  • Some discussion on DRAM/HBM cost and structure; HBM is more expensive due to stacking and interconnect area.

Training vs inference specialization

  • TPU 8t is training‑oriented; TPU 8i is for inference and post‑training.
  • Several see this split as part of a broader trend (e.g., other vendors developing inference‑focused chips).
  • Training viewed as compute‑bound, inference as more memory‑bound; specialization is seen as a way to gain efficiency.

Google’s strategic position vs competitors

  • Many argue Google’s vertical integration (chips, data centers, stack, distribution) is a long‑term advantage vs OpenAI/Anthropic, who must “rent” hardware and fight for market share.
  • Others push back, citing management “drift,” product missteps, and regulatory concerns over Google’s dominance in search, browser, and Android.
  • Comparisons to AWS/Azure: they also have custom silicon and deep Nvidia partnerships; some doubt Google can uniquely out‑optimize everyone.

Model quality and developer experience

  • Strong split in perceptions of Gemini:
    • Some find Gemini excellent for everyday tasks, research, math/engineering, and long‑context code edits; praise token efficiency and multilingual ability.
    • Others find it “second rate,” weak at coding/agents, prone to death loops and broken tool calls, with buggy CLI/IDE integrations and frequent timeouts.
  • General consensus that Google’s agentic/coding tooling (Gemini CLI, VS Code extension) lags behind competitors’ offerings, even if raw models are competitive.

Cost, efficiency, and economics

  • TPUs seen by some as significantly more FLOPs‑per‑dollar than Nvidia GPUs, giving Google a training and large‑scale inference cost edge.
  • Counterpoint: constrained TPU supply, TSMC capacity favoring Nvidia, and high GCP margins mean many researchers still choose cheaper GPU clouds.

AI demand, bubbles, and model commoditization

  • Several note overwhelming current demand and capacity shortages across providers, questioning “AI bubble collapse” narratives.
  • Others argue bubbles can pop even with long‑term demand (dot‑com, housing analogies).
  • Debate on whether large models will commoditize:
    • One side expects persistent differentiation as long as new human content exists.
    • Another expects open models and non‑US players to erode any one company’s frontier “moat.”

Product, stability, and deprecation policies

  • Complaints that Google aggressively deprecates Gemini models (1‑year windows), causing instability for workflows needing repeatability and cost predictability.
  • Some note that newer models can be more expensive (tokenization differences) and behave differently, forcing prompt and pipeline rework.
  • A few argue that if you need hard stability, self‑hosting is the only long‑term answer given hardware scarcity.