2024-05-31

Things you wish you didn't need to know about S3

Case sensitivity and filenames

Large subthread on whether S3 (and filesystems) should be case-sensitive.
Pro–case-sensitive:
- Filenames are just byte strings; storage shouldn’t guess which strings are “equivalent.”
- Case-insensitive comparison is locale-dependent and complex (e.g., Turkish “i/İ”, German “ß”, Dutch “IJ”).
- Easier and safer for programs; UI layers can provide case-insensitive search/completion.
Pro–case-insensitive:
- Many users intuitively treat “Book.docx” and “book.docx” as the same.
- Case sensitivity increases user error and friction (paths, globs, commands).
- Some like case-insensitive but case-preserving behavior (Windows, default macOS).

Unicode, locales, and “English-centrism”

Several comments note that “case” is not universal; discussions often assume English.
Examples from German, Turkish, Japanese, Chinese show ambiguity if you try to unify characters.
Others argue ASCII-only and English-centric design was simpler; Unicode and time zones add real complexity but are necessary to represent real languages.

S3 object model vs real directories

S3 “paths” are just key names; “/” is a convention, not a real hierarchy.
You can have keys like foo and foo/bar, or multiple slashes, and even an object literally named /.
New “directory buckets” try to add a more directory-like model but introduce more complexity and limitations.

Operational quirks and cost pitfalls

Multipart uploads:
- Incomplete uploads persist and incur storage unless cleaned (lifecycle rules strongly recommended).
- Minimum part size (5 MiB) can surprise streaming upload implementations.
- Multipart from multiple principals is awkward; often requires a single IAM user.
Deletion at scale:
- Deleting billions of objects via API is costly mainly due to LIST calls.
- Using lifecycle expiration (e.g., expire everything “now”) stops storage charges and lets AWS delete in the background.
Additional gotchas mentioned:
- HEAD often blocked where GET is allowed; people work around using ranged GETs.
- Bucket creation/deletion tied to DNS propagation, so not strictly read-after-write consistent.
- Object lock until distant future can be practically irreversible.
- S3 limits ~100 HTTP requests per TCP connection and then closes it; some clients mishandle this.
- Empty-bucket 404 billing story referenced; AWS now doesn’t charge for certain error responses.

S3 for web serving and latency

S3 alone has relatively high first-byte latency for small objects (≈100–200 ms).
Common practice is to front S3 with CloudFront (or another cache/CDN) for performance and cost savings.
Some suggest alternatives (memcached or CloudFront Functions) to add logic or validation around presigned URLs and uploads.

AWS complexity and alternatives

Multiple commenters say AWS/S3 feel too complex and “non-simple,” with many sharp edges and huge docs.
Some prefer simpler S3-alikes or object stores (DigitalOcean Spaces, Cloudflare R2, Hetzner), while noting they have their own quirks.
There’s interest in a cleaner, standardized object-storage protocol, but skepticism that a new standard would get broad adoption.

Meta observations

Several note tension between user-friendliness and correctness/simplicity at the low level.
Principle of least astonishment is seen as often violated; many behaviors are technically documented but surprising in practice.

Related topics