2026-04-05

Caveman: Why use many token when few token do trick

What Caveman Mode Is Trying to Do

Skill for Claude/Copilot that rewrites assistant replies into “caveman” style: short, low-fluff English to save tokens and improve readability.
Intended especially for coding agents where human-language explanation is secondary to code.
Some users like it because LLM “essay mode” feels bloated and tiring to read.

Tokens, Reasoning, and Model Performance

Major debate: are “tokens units of thinking”?
- One camp: more generated tokens → more computation/chain-of-thought → better reasoning; forcing brevity “makes the model dumber.”
- Others counter: not all tokens are equal; low-entropy filler like “you’re absolutely right” or politeness boilerplate likely adds little.
Clarifications:
- Modern models have hidden “thinking” / chain-of-thought tokens separate from visible output; caveman style may not touch those.
- But system prompts and style constraints still condition the reasoning tokens; concern that “act dumb / caveman” could bias the model toward simpler patterns.

Quality vs. Brevity

Several users report worse answers and more misunderstandings when they themselves talk like cavemen; they end up needing more interaction to clarify.
Others say concise prompts lead to concise but still-correct answers in many tasks, and that politeness and verbosity can change how much detail models provide.
Worry that compressed language removes useful context and disambiguation (e.g., “sea world” vs “see the world”).

Linguistic Comparisons

Comparisons to isolating languages (especially Chinese) and Latin: shorter forms can carry equivalent meaning, but ambiguity and training-data patterns matter.
Disagreement over whether some languages are inherently more “logical” or “efficient”; strong pushback against simplistic Sapir–Whorf-style claims.

Benchmarks, Evidence, and Author Clarification

Multiple commenters ask for evaluations: accuracy, latency, total input/output tokens.
Linked papers on chain-of-thought, scratchpads, thinking tokens, and brevity constraints, but no direct caveman-style benchmarks.
Skill author states:
- It’s mostly a joke and only targets visible completion, not hidden reasoning.
- The “~75% token reduction” is anecdotal; proper evals are planned.
- Real question is end-to-end tradeoff: token savings vs. possible quality loss and extra rework by agents.

Related topics