2024-12-21

Introducing S2

What S2 Is

Described as “S3 for streams”: append-only, ordered logs/streams as a cloud storage primitive.
Conceptually overlaps with message queues and Kafka-like event streams, but at a lower-level “log/record” abstraction rather than a full messaging system.
Meant as a building block for data systems (buffering, decoupling, journaling, event sourcing, WALs).

Differences vs Existing Systems (Kafka, Kinesis, WarpStream, S3)

Higher ordered throughput per stream/partition than typical managed streaming services (claims ~125 MiB/s append, 500 MiB/s real-time read).
“Unlimited” number of streams, avoiding shard/partition count limits in Kinesis/Kafka-like services.
Object-store-backed, but hides blob/byte-range complexity behind ordered records and sequence numbers.
Unlike plain S3 append objects, supports tailing reads and record semantics; unlike Kafka/Kinesis, exposes concurrency control (fencing) for safe distributed writes.

Performance & Architecture

Fully object-storage backed (no disks in their own infra); writes batched into multi-tenant chunks to keep S3 write sizes efficient.
Different storage classes to trade off latency vs cost; planned “native” NVMe-backed tier for very low tail latency.
Similar in spirit to systems like WarpStream or Gazette, but with different latency/architecture tradeoffs.

Security & Multi-tenancy

Data from multiple tenants is colocated in shared S3 objects, triggering worries about cross-tenant leaks.
Team plans per-stream or per-bucket authenticated encryption and encourages client-side encryption; single-tenant cells also mentioned as future option.
Lack of per-tenant encryption today is seen by some as a blocker for serious workloads.

Pricing, Egress & Sustainability

Initial public pricing for internet egress was below AWS list; this drew strong skepticism about viability and fears of future price hikes.
After feedback, planned egress pricing was adjusted upward; service is free during preview.
Some commenters argue retail cloud bandwidth costs make this a tough business unless high discounts at scale are secured.

Developer Experience & APIs

Current SDK focus is Rust and CLI; lack of Java/Python SDKs seen as a barrier for Kafka-heavy, Spring-based orgs.
Suggestions to build SDKs in non-Rust languages early to flush out “Rust-isms” in the API.
Desire for BYO-S3 or S3-compatible backends and self-hostable or source-available options to reduce lock-in.

Positioning, Use Cases & Market Concerns

Some find the landing page too focused on low-level primitives, not enough on concrete business problems and examples.
Feedback that adoption depends on making it trivially swappable with existing tooling (Kafka API compatibility, Iceberg integration, Debezium pipelines, IoT/MQTT, etc.).
There is both enthusiasm (“beautiful API”, “useful primitive”) and skepticism that the addressable market for a raw stream primitive is narrow without higher-level offerings.
Concern that large cloud vendors could easily ship a similar service (e.g., S3 append + record semantics) and undercut or overshadow S2.

Branding & Naming Reactions

Many joke about confusion with S3 and other “letter+number” products; some think “S2” sounds like a downgrade from S3.
Genuine concern raised that the name plus explicit S3 comparisons may invite trademark friction with Amazon.
Others view the name as clearly engineer-led and like the honest, S3-inspired positioning.

Future Directions & Feature Requests

Requested: compaction, event-sourcing helpers, GDPR-friendly deletion patterns, Athena/Presto-style querying, Kafka compatibility layer, IoT protocol adapters, emulator for local dev (possibly SQLite-backed).
Interest in integrating with emerging table formats (Iceberg, S3 Table buckets) by buffering small writes and flushing optimized Parquet files.

Related topics