Claude mixes up who said what

Nature of the “who said what” bug

  • Claude Code sometimes treats its own internal or assistant messages as if they were user messages, then confidently insists “you said that.”
  • Similar misattribution shows up in other systems (ChatGPT, Gemini, Copilot CLI, agents), especially in long or tool-heavy sessions.
  • Unclear whether this is purely a harness/UI bug (mislabeling roles) or a model behavior; several comments argue strongly it’s at least partly a model limitation.

Context windows, “dumb zone,” and degeneration

  • Many report LLMs degrading in long chats: forgetting instructions, losing tool-calling discipline, emitting raw JSON, repeating earlier prompts, or failing to respond.
  • Approaching context limits is described as a “dumb zone” or “decoherence” phase where role attribution, negation (“don’t do X”), and even basic behaviors break down.
  • Compaction/summarization of context may worsen confusion about who said what.

Determinism, chaos, and guarantees

  • Long subthread on determinism:
    • At token level, models can be made reproducible (fixed seed, temp=0, careful hardware).
    • But small prompt changes cause large, unpredictable semantic shifts; behavior is “chaotic” even if technically deterministic.
  • Several argue you cannot deterministically guarantee output properties (e.g., “never do X”) in the way you can with parameterized SQL or traditional code.

Data vs. control, prompt injection, and security

  • Core security concern: no hard architectural boundary between data and instructions; everything is just tokens in one stream.
  • Prompt injection is likened more to social engineering than SQL injection; you can only mitigate, not eliminate it, without destroying general-purpose usefulness.
  • Some argue LLMs should always be treated as untrusted users with limited, sandboxed permissions.

Proposed mitigations and design ideas

  • Better role/speaker encoding: “colored” tokens, speaker embeddings, or separate input channels for system/user/tool.
  • Stronger tool boundary enforcement: cryptographically constrained tool arguments, fine-grained permissions, and post-hoc filters.
  • Shorter or frequently refreshed chats; explicit “handoff documents” before compaction; restarting sessions after big mistakes.
  • Use LLMs as juniors: helpful but always supervised; never given unchecked access to critical systems.

Overall sentiment

  • Mix of fascination with capability (especially for coding) and deep unease about reliability, safety, and marketing overreach.
  • Consensus that current systems remain brittle, probabilistic tools, not robust autonomous agents.