Caveman: Why use many token when few token do trick
What Caveman Mode Is Trying to Do
- Skill for Claude/Copilot that rewrites assistant replies into “caveman” style: short, low-fluff English to save tokens and improve readability.
- Intended especially for coding agents where human-language explanation is secondary to code.
- Some users like it because LLM “essay mode” feels bloated and tiring to read.
Tokens, Reasoning, and Model Performance
- Major debate: are “tokens units of thinking”?
- One camp: more generated tokens → more computation/chain-of-thought → better reasoning; forcing brevity “makes the model dumber.”
- Others counter: not all tokens are equal; low-entropy filler like “you’re absolutely right” or politeness boilerplate likely adds little.
- Clarifications:
- Modern models have hidden “thinking” / chain-of-thought tokens separate from visible output; caveman style may not touch those.
- But system prompts and style constraints still condition the reasoning tokens; concern that “act dumb / caveman” could bias the model toward simpler patterns.
Quality vs. Brevity
- Several users report worse answers and more misunderstandings when they themselves talk like cavemen; they end up needing more interaction to clarify.
- Others say concise prompts lead to concise but still-correct answers in many tasks, and that politeness and verbosity can change how much detail models provide.
- Worry that compressed language removes useful context and disambiguation (e.g., “sea world” vs “see the world”).
Linguistic Comparisons
- Comparisons to isolating languages (especially Chinese) and Latin: shorter forms can carry equivalent meaning, but ambiguity and training-data patterns matter.
- Disagreement over whether some languages are inherently more “logical” or “efficient”; strong pushback against simplistic Sapir–Whorf-style claims.
Benchmarks, Evidence, and Author Clarification
- Multiple commenters ask for evaluations: accuracy, latency, total input/output tokens.
- Linked papers on chain-of-thought, scratchpads, thinking tokens, and brevity constraints, but no direct caveman-style benchmarks.
- Skill author states:
- It’s mostly a joke and only targets visible completion, not hidden reasoning.
- The “~75% token reduction” is anecdotal; proper evals are planned.
- Real question is end-to-end tradeoff: token savings vs. possible quality loss and extra rework by agents.