Introducing S2
What S2 Is
- Described as “S3 for streams”: append-only, ordered logs/streams as a cloud storage primitive.
- Conceptually overlaps with message queues and Kafka-like event streams, but at a lower-level “log/record” abstraction rather than a full messaging system.
- Meant as a building block for data systems (buffering, decoupling, journaling, event sourcing, WALs).
Differences vs Existing Systems (Kafka, Kinesis, WarpStream, S3)
- Higher ordered throughput per stream/partition than typical managed streaming services (claims ~125 MiB/s append, 500 MiB/s real-time read).
- “Unlimited” number of streams, avoiding shard/partition count limits in Kinesis/Kafka-like services.
- Object-store-backed, but hides blob/byte-range complexity behind ordered records and sequence numbers.
- Unlike plain S3 append objects, supports tailing reads and record semantics; unlike Kafka/Kinesis, exposes concurrency control (fencing) for safe distributed writes.
Performance & Architecture
- Fully object-storage backed (no disks in their own infra); writes batched into multi-tenant chunks to keep S3 write sizes efficient.
- Different storage classes to trade off latency vs cost; planned “native” NVMe-backed tier for very low tail latency.
- Similar in spirit to systems like WarpStream or Gazette, but with different latency/architecture tradeoffs.
Security & Multi-tenancy
- Data from multiple tenants is colocated in shared S3 objects, triggering worries about cross-tenant leaks.
- Team plans per-stream or per-bucket authenticated encryption and encourages client-side encryption; single-tenant cells also mentioned as future option.
- Lack of per-tenant encryption today is seen by some as a blocker for serious workloads.
Pricing, Egress & Sustainability
- Initial public pricing for internet egress was below AWS list; this drew strong skepticism about viability and fears of future price hikes.
- After feedback, planned egress pricing was adjusted upward; service is free during preview.
- Some commenters argue retail cloud bandwidth costs make this a tough business unless high discounts at scale are secured.
Developer Experience & APIs
- Current SDK focus is Rust and CLI; lack of Java/Python SDKs seen as a barrier for Kafka-heavy, Spring-based orgs.
- Suggestions to build SDKs in non-Rust languages early to flush out “Rust-isms” in the API.
- Desire for BYO-S3 or S3-compatible backends and self-hostable or source-available options to reduce lock-in.
Positioning, Use Cases & Market Concerns
- Some find the landing page too focused on low-level primitives, not enough on concrete business problems and examples.
- Feedback that adoption depends on making it trivially swappable with existing tooling (Kafka API compatibility, Iceberg integration, Debezium pipelines, IoT/MQTT, etc.).
- There is both enthusiasm (“beautiful API”, “useful primitive”) and skepticism that the addressable market for a raw stream primitive is narrow without higher-level offerings.
- Concern that large cloud vendors could easily ship a similar service (e.g., S3 append + record semantics) and undercut or overshadow S2.
Branding & Naming Reactions
- Many joke about confusion with S3 and other “letter+number” products; some think “S2” sounds like a downgrade from S3.
- Genuine concern raised that the name plus explicit S3 comparisons may invite trademark friction with Amazon.
- Others view the name as clearly engineer-led and like the honest, S3-inspired positioning.
Future Directions & Feature Requests
- Requested: compaction, event-sourcing helpers, GDPR-friendly deletion patterns, Athena/Presto-style querying, Kafka compatibility layer, IoT protocol adapters, emulator for local dev (possibly SQLite-backed).
- Interest in integrating with emerging table formats (Iceberg, S3 Table buckets) by buffering small writes and flushing optimized Parquet files.