2025-03-14

In S3 simplicity is table stakes

Perceived Simplicity vs Hidden Complexity

Many commenters praise the S3 API as a gold standard: conceptually just CRUD on blobs, yet backed by enormous engineering effort.
Others push back that S3 is “simple” only at the surface; global replication, versioning, events, lifecycle rules, and security make it a highly non-trivial system.
Authentication and signing are cited as a major source of complexity; some argue the API therefore isn’t truly simple.
The title phrase “table stakes” is unpacked as “the bare minimum required,” though some note this idiom often gets misused to shut down debate.

Durability, Availability, and DIY Storage

Long subthread on when S3 is better than self-managed RAID servers:
- Pro-S3 side: achieving S3-like durability/availability with DIY storage requires multiple locations, backups, and operational expertise; S3 lets most users “never think about the storage layer.”
- Skeptical side: raw S3 storage and bandwidth are more expensive per GB than disks, especially above some data threshold.
Several explain that durability (not losing data) and availability (service uptime) are different; lost data can be existential for many businesses.
There’s debate over how meaningful “11 nines” really is, especially with many objects and erasure coding; some see it partly as marketing.

Cost, Scale, and Total Cost of Ownership

One view: S3 is immediately more expensive on media and egress but far less complex to operate, with lifecycle policies and events adding a lot of value.
Counterview: for very small workloads, S3 is cheaper than even buying a single disk; for very large ones, full TCO (hardware, power, staffing, facilities) makes simple per-GB comparisons misleading.
Examples of companies that moved off S3 are mentioned as proof that matching its properties in-house is expensive.

Use Cases and Platform Capabilities

Common uses: backups, build artifacts, logs, metrics, SPAs/static sites via CDN, ad‑hoc file drops via presigned URLs, and TTL-based temporary storage.
S3’s “unlimited” capacity and huge aggregate throughput are highlighted as a moat: it effectively acts as a gigantic, auto-scaling network filesystem without capacity planning.
Eventing plus Lambda enables ingest/transform pipelines where bandwidth inside AWS is largely free.

Consistency, Tables, and Lakehouse Concerns

Strong consistency was very well received, but some say S3 lagged other clouds and still differs for multi-region setups.
S3 Tables and Iceberg integration are seen as powerful, but there’s concern this adds lakehouse-style complexity rather than simplifying the underlying immutable-object model.
Some lament S3 Select’s effective deprecation, as it was cheaper for simple single-snapshot queries compared to newer table abstractions.

Security and Misconfiguration

Bucket leaks provoke debate: one side notes S3 is private by default and public access has valid use cases; another argues good security should prevent “stupid” misconfigurations, not just warn about them.
Broader point: cloud accounts themselves can be single points of failure, so traditional backup principles (e.g., 3‑2‑1 rule) still matter.

SDKs, Naming, and Developer Experience

The core API is admired, but language SDKs—especially older JS—are criticized as heavy and convoluted; service-specific SDK splits have helped somewhat.
Some readers were initially confused by “S3” in the blog title (e.g., thinking of standby power states or old GPU vendors), underscoring that AWS-centric context isn’t universal.

Related topics