2026-02-28

MCP server that reduces Claude Code context consumption by 98%

Scope and MCP limitations

The technique only affects tools whose execution can be routed through shell/subprocess hooks (Bash, Read, Grep, Glob, WebFetch, WebSearch, etc.).
Several commenters empirically confirmed it cannot intercept MCP tool responses today: MCP replies go via JSON-RPC straight into the model, and Claude Code exposes no PostToolUse hook.
Result: the “98% context reduction” applies to built‑in tools and CLI-like workflows (curl, gh, kubectl, Playwright snapshots, git logs), not to third‑party MCP tools.
For custom MCPs, commenters suggest applying the same pattern server‑side: return compact summaries, store full outputs in a queryable store, and expose drill‑down tools.

Context management strategies

Many see this as “pre‑compaction”: big outputs run in a sandbox; only summaries hit context; full data is stored in a local SQLite FTS5 index for later search.
Long subthread explores broader “agentic context management”: pruning failed attempts, branching/rollback, treating context like an editable structure rather than an immutable log.
People share similar patterns: subagents doing work off‑context then returning summaries; piping tool output to files and only reading relevant slices; smaller local models summarizing logs.

Caching, performance, and accuracy concerns

Multiple clarifications that this approach doesn’t break Claude’s prompt cache because the raw payload never enters the conversation; only smaller, deterministic summaries do.
Some worry that compressing outputs and requiring extraction scripts/search queries can lose information or increase hallucinations if the model writes poor retrieval logic.
Skeptics argue that “98% context savings” are meaningless without benchmarks on task quality and harness performance; they question how often summarization mistakes matter in practice.
Others counter that large volumes of logs/snapshots already harm focus; reducing noise should improve reasoning, though no formal evals are cited.

Comparisons to related tools and patterns

Compared to tools like rtk, this goes beyond trimming CLI output by indexing full outputs for later retrieval instead of discarding them.
One commenter describes a hybrid BM25 + vector search index (with incremental updates) for large Obsidian vaults as a more powerful variant of the same idea.
Another notes similar ideas in database/log tooling (returning token‑optimized summaries over in‑memory dataframes), and observes MCP’s ability to carry non‑text content.
Some ask how this differs from RAG; the implicit answer is that it’s essentially RAG applied to tool outputs within a coding agent.

User experiences and practical use

A few users report substantial token savings and recommend it to their teams.
Others note that much waste can also be avoided by not enabling dozens of MCP tools by default and by preferring lean CLI tools where possible.
There is scattered skepticism about the project’s seriousness and copywriting, alongside clear interest in the architectural pattern itself.

Related topics