Things you wish you didn't need to know about S3

Case sensitivity and filenames

  • Large subthread on whether S3 (and filesystems) should be case-sensitive.
  • Pro–case-sensitive:
    • Filenames are just byte strings; storage shouldn’t guess which strings are “equivalent.”
    • Case-insensitive comparison is locale-dependent and complex (e.g., Turkish “i/İ”, German “ß”, Dutch “IJ”).
    • Easier and safer for programs; UI layers can provide case-insensitive search/completion.
  • Pro–case-insensitive:
    • Many users intuitively treat “Book.docx” and “book.docx” as the same.
    • Case sensitivity increases user error and friction (paths, globs, commands).
    • Some like case-insensitive but case-preserving behavior (Windows, default macOS).

Unicode, locales, and “English-centrism”

  • Several comments note that “case” is not universal; discussions often assume English.
  • Examples from German, Turkish, Japanese, Chinese show ambiguity if you try to unify characters.
  • Others argue ASCII-only and English-centric design was simpler; Unicode and time zones add real complexity but are necessary to represent real languages.

S3 object model vs real directories

  • S3 “paths” are just key names; “/” is a convention, not a real hierarchy.
  • You can have keys like foo and foo/bar, or multiple slashes, and even an object literally named /.
  • New “directory buckets” try to add a more directory-like model but introduce more complexity and limitations.

Operational quirks and cost pitfalls

  • Multipart uploads:
    • Incomplete uploads persist and incur storage unless cleaned (lifecycle rules strongly recommended).
    • Minimum part size (5 MiB) can surprise streaming upload implementations.
    • Multipart from multiple principals is awkward; often requires a single IAM user.
  • Deletion at scale:
    • Deleting billions of objects via API is costly mainly due to LIST calls.
    • Using lifecycle expiration (e.g., expire everything “now”) stops storage charges and lets AWS delete in the background.
  • Additional gotchas mentioned:
    • HEAD often blocked where GET is allowed; people work around using ranged GETs.
    • Bucket creation/deletion tied to DNS propagation, so not strictly read-after-write consistent.
    • Object lock until distant future can be practically irreversible.
    • S3 limits ~100 HTTP requests per TCP connection and then closes it; some clients mishandle this.
    • Empty-bucket 404 billing story referenced; AWS now doesn’t charge for certain error responses.

S3 for web serving and latency

  • S3 alone has relatively high first-byte latency for small objects (≈100–200 ms).
  • Common practice is to front S3 with CloudFront (or another cache/CDN) for performance and cost savings.
  • Some suggest alternatives (memcached or CloudFront Functions) to add logic or validation around presigned URLs and uploads.

AWS complexity and alternatives

  • Multiple commenters say AWS/S3 feel too complex and “non-simple,” with many sharp edges and huge docs.
  • Some prefer simpler S3-alikes or object stores (DigitalOcean Spaces, Cloudflare R2, Hetzner), while noting they have their own quirks.
  • There’s interest in a cleaner, standardized object-storage protocol, but skepticism that a new standard would get broad adoption.

Meta observations

  • Several note tension between user-friendliness and correctness/simplicity at the low level.
  • Principle of least astonishment is seen as often violated; many behaviors are technically documented but surprising in practice.