Amazon S3 Adds Put-If-Match (Compare-and-Swap)

Significance of S3 Put-If-Match (CAS)

  • Seen as the missing primitive to safely coordinate multiple writers to the same object (e.g., WALs, logs).
  • Combined with strong read-after-write consistency, it enables optimistic concurrency control directly on S3.
  • Many note that GCP, Azure, and MinIO have had similar conditional write / ETag-based controls for years; some are surprised it took S3 so long.
  • Some question why “standard ETag support” is headline-worthy, others argue the scale and engineering complexity at S3 make the delay understandable.

Use Cases and New Patterns

  • Building databases and consensus systems directly on object storage (e.g., object-backed DBs, transactional logs).
  • Terraform-style state locking without an additional store like DynamoDB.
  • Lock-free-like patterns: repeatedly read–modify–CAS to maintain invariants (inventory counters, transaction logs).
  • Potential for “S3 as a database” or even SQLite-over-S3; acknowledged as likely slow without caching, but now technically feasible and more “serverless.”
  • Some see this reducing the need for separate coordination systems in S3-backed applications; others still mention systems like Delta Lake or external consensus services.

ETags, Hashing, and Integrity Concerns

  • Discussion of MD5-based ETags: fine for random bit-error detection, unsafe against adversarial collisions.
  • Hypothetical attacks where an untrusted party crafts MD5-colliding data to cause log entries to be “lost” if CAS is keyed only on MD5.
  • Google Cloud’s generation numbers and Azure’s concurrency controls are cited as clearer, monotonic versioning primitives.
  • Desire for content-addressable storage on S3: enforce that object key equals a secure hash of content via IAM/policies.
  • Current S3 checksums (SHA-256 and multipart “hash-of-hashes”) help with integrity but are awkward for content-addressable or CAS semantics, especially with multipart uploads.

Limitations, Ambiguities, and CAP Discussion

  • CAS and strong consistency are per-object; coordinating multi-object updates still “requires creativity.”
  • Some light debate on how this interacts with CAP: consensus that availability must occasionally be sacrificed to preserve consistency under partitions.
  • Unclear exactly when comparisons happen for large uploads (early vs final commit), and whether there are performance impacts on other operations.

Ecosystem and Meta Points

  • Open-source and proprietary systems built on object stores plan to exploit this immediately.
  • Some push back on promotional comments about commercial products; others see them as valid technical case studies.