Colossus for Rapid Storage

What Rapid Storage / Colossus-Based Buckets Are

  • New Cloud Storage zonal bucket type colocated with GPUs/TPUs for much higher random-read throughput (claimed up to 20x vs regional buckets).
  • Built directly on Colossus’ stateful protocol; a gRPC client is planned that is essentially a thin wrapper over Colossus.
  • Targeted at AI/ML workloads and analytics that need very high random-read bandwidth (e.g., large Parquet datasets, LLM training).

Zonal vs Regional / Multi-Region Semantics

  • “Zonal” = tied to a single availability zone; may still be replicated but replicas can share failure domains.
  • In Google’s terminology, “regional” usually implies transparent multi‑zone replication; “zonal” does not.
  • Rapid Storage complements existing regional and dual‑region buckets; users can choose latency/durability/cost tradeoffs via the same GCS API.

Comparison to AWS S3 Express One Zone and Other Providers

  • Several comments frame Rapid Storage as GCP’s answer to S3 Express One Zone (low-latency, single‑AZ object storage).
  • S3 Express offers much lower latency but is significantly more expensive than standard S3; naming is criticized as misleading.
  • Some argue GCP now uniquely offers low‑latency zonal, standard regional, and transparent dual‑region object storage under one consistent API; others counter that S3 has overlapping but not identical multi-region features.

Performance, AI Branding, and Hypercomputer Marketing

  • Mixed reaction to marketing: some praise finally exposing Colossus-like capabilities; others see “AI infrastructure” and hypercomputer FLOPS comparisons as heavy spin.
  • Confusion over claims that a TPU pod exceeds the world’s largest supercomputer; clarified that Google is comparing 8‑bit AI FLOPS, not traditional 64‑bit supercomputing FLOPS.

Cost, Elasticity, and DIY Alternatives

  • Detailed back-of-the-envelope comparisons argue Google’s “HDD prices” rhetoric is overstated versus self-built storage and cheaper cloud storage (e.g., Backblaze, Hetzner).
  • Counterpoints emphasize elasticity and operational convenience: instant bucket creation, scaling to TBs then deleting, avoiding hardware deployment/maintenance, and fine‑grained isolation.

Colossus Semantics and Tradeoffs

  • Colossus objects are append-only with a single writer; objects can be “finalized” to disallow further writes; no random writes.
  • Advocates: dropping POSIX features like multi-writer atomic updates enables much higher performance, cost efficiency, and strong multi‑tenant isolation at scale.
  • Skeptics note that such semantics can be hard to retrofit into existing POSIX-based applications, which likely delayed a direct Colossus offering.

Anywhere Cache vs Rapid Storage

  • Anywhere Cache = SSD cache in front of normal (often multi‑regional) buckets; improves latency and avoids egress on cache hits.
  • Rapid Storage = new bucket type with all data locally stored and fast, including writes, plus fast durable appends—semantics not available in standard buckets.

Adoption and Product Stability Concerns

  • Some excitement from users in scientific computing and analytics who expect major speedups from data locality.
  • Others caution against early adoption due to Google’s history of killing or reshaping products; recommendation to wait and see if Rapid Storage “sticks.”