MCP server that reduces Claude Code context consumption by 98%
Scope and MCP limitations
- The technique only affects tools whose execution can be routed through shell/subprocess hooks (Bash, Read, Grep, Glob, WebFetch, WebSearch, etc.).
- Several commenters empirically confirmed it cannot intercept MCP tool responses today: MCP replies go via JSON-RPC straight into the model, and Claude Code exposes no PostToolUse hook.
- Result: the “98% context reduction” applies to built‑in tools and CLI-like workflows (curl, gh, kubectl, Playwright snapshots, git logs), not to third‑party MCP tools.
- For custom MCPs, commenters suggest applying the same pattern server‑side: return compact summaries, store full outputs in a queryable store, and expose drill‑down tools.
Context management strategies
- Many see this as “pre‑compaction”: big outputs run in a sandbox; only summaries hit context; full data is stored in a local SQLite FTS5 index for later search.
- Long subthread explores broader “agentic context management”: pruning failed attempts, branching/rollback, treating context like an editable structure rather than an immutable log.
- People share similar patterns: subagents doing work off‑context then returning summaries; piping tool output to files and only reading relevant slices; smaller local models summarizing logs.
Caching, performance, and accuracy concerns
- Multiple clarifications that this approach doesn’t break Claude’s prompt cache because the raw payload never enters the conversation; only smaller, deterministic summaries do.
- Some worry that compressing outputs and requiring extraction scripts/search queries can lose information or increase hallucinations if the model writes poor retrieval logic.
- Skeptics argue that “98% context savings” are meaningless without benchmarks on task quality and harness performance; they question how often summarization mistakes matter in practice.
- Others counter that large volumes of logs/snapshots already harm focus; reducing noise should improve reasoning, though no formal evals are cited.
Comparisons to related tools and patterns
- Compared to tools like
rtk, this goes beyond trimming CLI output by indexing full outputs for later retrieval instead of discarding them. - One commenter describes a hybrid BM25 + vector search index (with incremental updates) for large Obsidian vaults as a more powerful variant of the same idea.
- Another notes similar ideas in database/log tooling (returning token‑optimized summaries over in‑memory dataframes), and observes MCP’s ability to carry non‑text content.
- Some ask how this differs from RAG; the implicit answer is that it’s essentially RAG applied to tool outputs within a coding agent.
User experiences and practical use
- A few users report substantial token savings and recommend it to their teams.
- Others note that much waste can also be avoided by not enabling dozens of MCP tools by default and by preferring lean CLI tools where possible.
- There is scattered skepticism about the project’s seriousness and copywriting, alongside clear interest in the architectural pattern itself.