2026-03-31

Universal Claude.md – cut Claude output tokens

Original Article ↗ Hacker News Discussion ↗

Token costs and where savings matter

Several commenters note that most real-world cost is from input tokens, not output; cited data suggests ~93% input vs ~4% output tokens in programming use.
Output tokens are often more expensive and not cached, so reducing them can still matter, but long CLAUDE.md files increase input tokens on every request.
One issue raised: the project’s own benchmarks count only output tokens and ignore accuracy and total (input+output) cost.

Impact on quality, reasoning, and agentic workflows

Many argue that forcing short outputs and “answer-first” behavior can hurt reasoning quality, especially for math or complex coding tasks.
There’s concern that suppressing “redundant” explanation harms long-running, agentic coding sessions where explicit reasoning helps maintain coherence.
Others counter that much verbosity is low-value (sycophancy, restating prompts, soft warnings) and can be safely trimmed.

Prompt design critiques

Strong criticism of rules like “answer is always line 1,” “no redundant context,” “no unsolicited suggestions,” and “accept any user correction as ground truth.”
Detractors say these conflict with autoregressive behavior, increase hallucination risk, and remove useful pushback and safety margin.
The approach is seen by some as pushing the model out of its trained distribution and “dumbing it down.”

Alternative token-efficiency strategies

Suggestions include external context compression and memory tools (e.g., proxies that compress context and CLI output, persistent project memories) rather than aggressive output suppression.
Several describe “handoff”/“checkpoint” workflows: generating markdown summaries of sessions, storing them in the repo, and using them as durable, compact context across sessions.

Vanilla Claude vs heavy customization

Some prefer staying close to the default Claude Code setup, arguing the vendor is heavily incentivized to tune it well for coding and that custom stacks churn quickly.
Others maintain minimal prompts like “be concise” and use skills or local processes for formatting, rather than large universal CLAUDE.md files.

Meta: understanding LLMs

A recurring theme is that many prompt hacks ignore basic LLM properties (autoregression, training on performance vs length).
Several commenters emphasize careful A/B testing and note at least one external evaluation showing this prompt reduced efficiency compared to no instructions.