Things you wish you didn't need to know about S3
Case sensitivity and filenames
- Large subthread on whether S3 (and filesystems) should be case-sensitive.
- Pro–case-sensitive:
- Filenames are just byte strings; storage shouldn’t guess which strings are “equivalent.”
- Case-insensitive comparison is locale-dependent and complex (e.g., Turkish “i/İ”, German “ß”, Dutch “IJ”).
- Easier and safer for programs; UI layers can provide case-insensitive search/completion.
- Pro–case-insensitive:
- Many users intuitively treat “Book.docx” and “book.docx” as the same.
- Case sensitivity increases user error and friction (paths, globs, commands).
- Some like case-insensitive but case-preserving behavior (Windows, default macOS).
Unicode, locales, and “English-centrism”
- Several comments note that “case” is not universal; discussions often assume English.
- Examples from German, Turkish, Japanese, Chinese show ambiguity if you try to unify characters.
- Others argue ASCII-only and English-centric design was simpler; Unicode and time zones add real complexity but are necessary to represent real languages.
S3 object model vs real directories
- S3 “paths” are just key names; “/” is a convention, not a real hierarchy.
- You can have keys like
fooandfoo/bar, or multiple slashes, and even an object literally named/. - New “directory buckets” try to add a more directory-like model but introduce more complexity and limitations.
Operational quirks and cost pitfalls
- Multipart uploads:
- Incomplete uploads persist and incur storage unless cleaned (lifecycle rules strongly recommended).
- Minimum part size (5 MiB) can surprise streaming upload implementations.
- Multipart from multiple principals is awkward; often requires a single IAM user.
- Deletion at scale:
- Deleting billions of objects via API is costly mainly due to LIST calls.
- Using lifecycle expiration (e.g., expire everything “now”) stops storage charges and lets AWS delete in the background.
- Additional gotchas mentioned:
- HEAD often blocked where GET is allowed; people work around using ranged GETs.
- Bucket creation/deletion tied to DNS propagation, so not strictly read-after-write consistent.
- Object lock until distant future can be practically irreversible.
- S3 limits ~100 HTTP requests per TCP connection and then closes it; some clients mishandle this.
- Empty-bucket 404 billing story referenced; AWS now doesn’t charge for certain error responses.
S3 for web serving and latency
- S3 alone has relatively high first-byte latency for small objects (≈100–200 ms).
- Common practice is to front S3 with CloudFront (or another cache/CDN) for performance and cost savings.
- Some suggest alternatives (memcached or CloudFront Functions) to add logic or validation around presigned URLs and uploads.
AWS complexity and alternatives
- Multiple commenters say AWS/S3 feel too complex and “non-simple,” with many sharp edges and huge docs.
- Some prefer simpler S3-alikes or object stores (DigitalOcean Spaces, Cloudflare R2, Hetzner), while noting they have their own quirks.
- There’s interest in a cleaner, standardized object-storage protocol, but skepticism that a new standard would get broad adoption.
Meta observations
- Several note tension between user-friendliness and correctness/simplicity at the low level.
- Principle of least astonishment is seen as often violated; many behaviors are technically documented but surprising in practice.