MCP server that reduces Claude Code context consumption by 98%

Scope and MCP limitations

  • The technique only affects tools whose execution can be routed through shell/subprocess hooks (Bash, Read, Grep, Glob, WebFetch, WebSearch, etc.).
  • Several commenters empirically confirmed it cannot intercept MCP tool responses today: MCP replies go via JSON-RPC straight into the model, and Claude Code exposes no PostToolUse hook.
  • Result: the “98% context reduction” applies to built‑in tools and CLI-like workflows (curl, gh, kubectl, Playwright snapshots, git logs), not to third‑party MCP tools.
  • For custom MCPs, commenters suggest applying the same pattern server‑side: return compact summaries, store full outputs in a queryable store, and expose drill‑down tools.

Context management strategies

  • Many see this as “pre‑compaction”: big outputs run in a sandbox; only summaries hit context; full data is stored in a local SQLite FTS5 index for later search.
  • Long subthread explores broader “agentic context management”: pruning failed attempts, branching/rollback, treating context like an editable structure rather than an immutable log.
  • People share similar patterns: subagents doing work off‑context then returning summaries; piping tool output to files and only reading relevant slices; smaller local models summarizing logs.

Caching, performance, and accuracy concerns

  • Multiple clarifications that this approach doesn’t break Claude’s prompt cache because the raw payload never enters the conversation; only smaller, deterministic summaries do.
  • Some worry that compressing outputs and requiring extraction scripts/search queries can lose information or increase hallucinations if the model writes poor retrieval logic.
  • Skeptics argue that “98% context savings” are meaningless without benchmarks on task quality and harness performance; they question how often summarization mistakes matter in practice.
  • Others counter that large volumes of logs/snapshots already harm focus; reducing noise should improve reasoning, though no formal evals are cited.

Comparisons to related tools and patterns

  • Compared to tools like rtk, this goes beyond trimming CLI output by indexing full outputs for later retrieval instead of discarding them.
  • One commenter describes a hybrid BM25 + vector search index (with incremental updates) for large Obsidian vaults as a more powerful variant of the same idea.
  • Another notes similar ideas in database/log tooling (returning token‑optimized summaries over in‑memory dataframes), and observes MCP’s ability to carry non‑text content.
  • Some ask how this differs from RAG; the implicit answer is that it’s essentially RAG applied to tool outputs within a coding agent.

User experiences and practical use

  • A few users report substantial token savings and recommend it to their teams.
  • Others note that much waste can also be avoided by not enabling dozens of MCP tools by default and by preferring lean CLI tools where possible.
  • There is scattered skepticism about the project’s seriousness and copywriting, alongside clear interest in the architectural pattern itself.