Universal Claude.md – cut Claude output tokens

Token costs and where savings matter

  • Several commenters note that most real-world cost is from input tokens, not output; cited data suggests ~93% input vs ~4% output tokens in programming use.
  • Output tokens are often more expensive and not cached, so reducing them can still matter, but long CLAUDE.md files increase input tokens on every request.
  • One issue raised: the project’s own benchmarks count only output tokens and ignore accuracy and total (input+output) cost.

Impact on quality, reasoning, and agentic workflows

  • Many argue that forcing short outputs and “answer-first” behavior can hurt reasoning quality, especially for math or complex coding tasks.
  • There’s concern that suppressing “redundant” explanation harms long-running, agentic coding sessions where explicit reasoning helps maintain coherence.
  • Others counter that much verbosity is low-value (sycophancy, restating prompts, soft warnings) and can be safely trimmed.

Prompt design critiques

  • Strong criticism of rules like “answer is always line 1,” “no redundant context,” “no unsolicited suggestions,” and “accept any user correction as ground truth.”
  • Detractors say these conflict with autoregressive behavior, increase hallucination risk, and remove useful pushback and safety margin.
  • The approach is seen by some as pushing the model out of its trained distribution and “dumbing it down.”

Alternative token-efficiency strategies

  • Suggestions include external context compression and memory tools (e.g., proxies that compress context and CLI output, persistent project memories) rather than aggressive output suppression.
  • Several describe “handoff”/“checkpoint” workflows: generating markdown summaries of sessions, storing them in the repo, and using them as durable, compact context across sessions.

Vanilla Claude vs heavy customization

  • Some prefer staying close to the default Claude Code setup, arguing the vendor is heavily incentivized to tune it well for coding and that custom stacks churn quickly.
  • Others maintain minimal prompts like “be concise” and use skills or local processes for formatting, rather than large universal CLAUDE.md files.

Meta: understanding LLMs

  • A recurring theme is that many prompt hacks ignore basic LLM properties (autoregression, training on performance vs length).
  • Several commenters emphasize careful A/B testing and note at least one external evaluation showing this prompt reduced efficiency compared to no instructions.