2025-09-08

Will Amazon S3 Vectors kill vector databases or save them?

Integrated document + vector storage

Some want a single doc store that natively handles both text/metadata and vectors; many vector DBs are perceived as “just storing vectors.”
Others note existing options that combine both: Chroma, Azure AI Search, Elasticsearch, Vespa, MongoDB Atlas, Postgres, SQLite, and various vector indexes layered on general-purpose DBs.
There’s interest in upcoming or niche systems that tightly integrate search with storage and return exact spans, not just documents.

AWS S3 Vectors: role, design, and limitations

Many see S3 Vectors as a “lightweight, good-enough” primitive rather than a full search engine: useful for cheap, cold, low-QPS retrieval, but not a replacement for systems like Milvus, Elasticsearch, or Turbopuffer.
Limitations raised: topK capped at 30 (smaller when filters apply), no clear hybrid (dense + sparse) search story, unknown or undocumented latency characteristics.
Some argue it’s premature to judge performance from a Preview release, since AWS historically raises quotas and improves behavior at GA.

Cost, performance, and vector DB value

One cited case: an AI note-taking app spends more on vector search than on LLM calls, surprising some readers and provoking discussion about memory-heavy HNSW indexes and expensive managed services.
Commentary that vector DBs earn their keep with latency, recall, hybrid search, and integrated pipelines; S3-backed systems and services like Turbopuffer or LanceDB aim to cut storage costs while caching hot data.
Others emphasize that if you start sending full documents as context, LLM costs can easily dominate again.

Documentation opacity and AWS internals

Multiple comments lament AWS’s sparse documentation on internal behavior (e.g., S3 Vectors filtering pipeline, ALB load balancing, DynamoDB scaling).
Arguments:
- Users need to understand performance trade-offs (indexing, filtering, scaling) to design architectures.
- AWS teams fear that documenting details makes them de facto contracts, complicating future changes and migrations.
Counterpoints: Hyrum’s Law means customers will depend on observed behavior anyway; reverse-engineering is now an implicit “shadow cost” of cloud use.

Security, censorship, and data access

One view: by hosting vectors, AWS could “meta-optimize” infrastructure, support censorship more cheaply (re-using customer embeddings), and increase lock-in via proprietary embedding models.
Pushback:
- AWS’s data-plane vs control-plane separation means they supposedly can’t casually inspect customer data; specialized regions (GovCloud, HIPAA-eligible services) are more about compliance and segmentation than routine access.
- Skepticism about the censorship thesis: similar concerns would apply to any managed database.
Some speculate (unclear, not evidenced) that cloud providers may already be synthesizing/deriving training corpora from customer data, even if PII is scrubbed.

Postgres/pgvector and general-purpose DBs vs. vector DBs

One camp: Postgres + pgvector (and similar extensions) is “good enough” for most workloads (up to millions of vectors), keeps data co-located, is OSS, and avoids operational overhead and vendor risk of specialized vector DBs.
Another camp: for “hot loop,” low-latency, or very large-scale workloads, Postgres/pgvector is inadequate; you’ll hit performance and replication gymnastics, and dedicated systems provide better recall, latency, and indexing.
Rough consensus: pgvector is great for prototyping, small/medium or non-core workloads; specialized DBs shine at 10^8–10^9 vectors, complex filters, and heavy throughput.

Alternative tools and directions

Mentioned options: Turbopuffer (S3-backed with caching, BM25, recall tuning), LanceDB (object-store-based, S3-compatible, cheap), Cloudflare Vectorize (very low per-vector cost), Qdrant, on-device/edge stores like ObjectBox.
Some see S3’s move as part of a broader play against data platforms like Databricks by making S3 more query- and analytics-capable over time.
A few think S3 Vectors is “game changing”; others see it as another tier in a maturing, multi-layered vector ecosystem rather than a killer of vector databases.

Related topics