2026-05-17

Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

Benchmarks & Evaluation

Current benchmarks measure retrieval quality (e.g., NDCG), not end‑to‑end agent performance.
Some commenters argue this is “the wrong thing” to optimize, since what matters is whether agents finish tasks faster/cheaper with equal or better quality.
Others share small, informal agent evals: Semble sometimes saves context tokens, but can increase latency or produce only marginal cost improvements.
There are calls for open, reproducible agent benchmarks (including harness configuration) and full-session cost/quality metrics.

Token Savings vs Grep

The “98% fewer tokens” claim is clarified as comparing the common grep + readfile(cat) loop versus Semble’s smaller targeted snippets.
Several note that grep itself is token‑free; the cost comes from agents reading large file chunks or entire files.
Some argue well‑prompted agents already use grep -C N or selective reads, making the savings less extreme; others say agents often just cat whole files in practice.

Agent Integration, Trust & Behavior

Many LLMs are heavily trained on grep/rg and may distrust or over-query new tools, negating theoretical savings.
People discuss using hooks, memory files (e.g., AGENTS.md/CLAUDE.md), and explicit instructions to push models toward Semble or LSPs.
Reports of MCP/CLI integration issues include hanging processes, connection errors, and agents redundantly combining Semble with ripgrep.
There is concern that extra tools can make agents “dumber” by encouraging aggressive, shallow searching and more turns.

Comparisons to Other Tools

Compared conceptually or anecdotally with: ripgrep, LSPs, RTK, Headroom, context‑mode, Serena, codebase‑memory‑mcp, CK, cs, Cursor indexing, and ck‑style structured search.
Some users report Semble indexing dramatically faster and returning more relevant code than CK on large repos.
Others prefer LSP‑based navigation for refactors and type‑aware analysis, seeing Semble as complementary.

Performance, Design & Scope

Indexing is reported as very fast; chunking uses tree‑sitter; models are trained on several languages but claimed to generalize more widely.
Implemented in Python for familiarity, despite comments wishing for Rust/Go.
Tool is local, deterministic, and aims to do “one thing: fast semantic code search.”

Broader Concerns & Alternatives

Suggestions to measure additional metrics like correction-loop frequency and end‑to‑end session tokens/time.
Some argue that structured project docs (e.g., a curated PROJECT.md) or whole‑repo dumps for small projects can rival or beat specialized search in practice.
Security concerns focus on supply‑chain risks; maintainers emphasize local‑only behavior and minimal dependencies, but acknowledge transitive risks remain.

Related topics