Universal Claude.md – cut Claude output tokens
Token costs and where savings matter
- Several commenters note that most real-world cost is from input tokens, not output; cited data suggests ~93% input vs ~4% output tokens in programming use.
- Output tokens are often more expensive and not cached, so reducing them can still matter, but long CLAUDE.md files increase input tokens on every request.
- One issue raised: the project’s own benchmarks count only output tokens and ignore accuracy and total (input+output) cost.
Impact on quality, reasoning, and agentic workflows
- Many argue that forcing short outputs and “answer-first” behavior can hurt reasoning quality, especially for math or complex coding tasks.
- There’s concern that suppressing “redundant” explanation harms long-running, agentic coding sessions where explicit reasoning helps maintain coherence.
- Others counter that much verbosity is low-value (sycophancy, restating prompts, soft warnings) and can be safely trimmed.
Prompt design critiques
- Strong criticism of rules like “answer is always line 1,” “no redundant context,” “no unsolicited suggestions,” and “accept any user correction as ground truth.”
- Detractors say these conflict with autoregressive behavior, increase hallucination risk, and remove useful pushback and safety margin.
- The approach is seen by some as pushing the model out of its trained distribution and “dumbing it down.”
Alternative token-efficiency strategies
- Suggestions include external context compression and memory tools (e.g., proxies that compress context and CLI output, persistent project memories) rather than aggressive output suppression.
- Several describe “handoff”/“checkpoint” workflows: generating markdown summaries of sessions, storing them in the repo, and using them as durable, compact context across sessions.
Vanilla Claude vs heavy customization
- Some prefer staying close to the default Claude Code setup, arguing the vendor is heavily incentivized to tune it well for coding and that custom stacks churn quickly.
- Others maintain minimal prompts like “be concise” and use skills or local processes for formatting, rather than large universal CLAUDE.md files.
Meta: understanding LLMs
- A recurring theme is that many prompt hacks ignore basic LLM properties (autoregression, training on performance vs length).
- Several commenters emphasize careful A/B testing and note at least one external evaluation showing this prompt reduced efficiency compared to no instructions.