Caveman: Why use many token when few token do trick

What Caveman Mode Is Trying to Do

  • Skill for Claude/Copilot that rewrites assistant replies into “caveman” style: short, low-fluff English to save tokens and improve readability.
  • Intended especially for coding agents where human-language explanation is secondary to code.
  • Some users like it because LLM “essay mode” feels bloated and tiring to read.

Tokens, Reasoning, and Model Performance

  • Major debate: are “tokens units of thinking”?
    • One camp: more generated tokens → more computation/chain-of-thought → better reasoning; forcing brevity “makes the model dumber.”
    • Others counter: not all tokens are equal; low-entropy filler like “you’re absolutely right” or politeness boilerplate likely adds little.
  • Clarifications:
    • Modern models have hidden “thinking” / chain-of-thought tokens separate from visible output; caveman style may not touch those.
    • But system prompts and style constraints still condition the reasoning tokens; concern that “act dumb / caveman” could bias the model toward simpler patterns.

Quality vs. Brevity

  • Several users report worse answers and more misunderstandings when they themselves talk like cavemen; they end up needing more interaction to clarify.
  • Others say concise prompts lead to concise but still-correct answers in many tasks, and that politeness and verbosity can change how much detail models provide.
  • Worry that compressed language removes useful context and disambiguation (e.g., “sea world” vs “see the world”).

Linguistic Comparisons

  • Comparisons to isolating languages (especially Chinese) and Latin: shorter forms can carry equivalent meaning, but ambiguity and training-data patterns matter.
  • Disagreement over whether some languages are inherently more “logical” or “efficient”; strong pushback against simplistic Sapir–Whorf-style claims.

Benchmarks, Evidence, and Author Clarification

  • Multiple commenters ask for evaluations: accuracy, latency, total input/output tokens.
  • Linked papers on chain-of-thought, scratchpads, thinking tokens, and brevity constraints, but no direct caveman-style benchmarks.
  • Skill author states:
    • It’s mostly a joke and only targets visible completion, not hidden reasoning.
    • The “~75% token reduction” is anecdotal; proper evals are planned.
    • Real question is end-to-end tradeoff: token savings vs. possible quality loss and extra rework by agents.