Claude mixes up who said what
Nature of the “who said what” bug
- Claude Code sometimes treats its own internal or assistant messages as if they were user messages, then confidently insists “you said that.”
- Similar misattribution shows up in other systems (ChatGPT, Gemini, Copilot CLI, agents), especially in long or tool-heavy sessions.
- Unclear whether this is purely a harness/UI bug (mislabeling roles) or a model behavior; several comments argue strongly it’s at least partly a model limitation.
Context windows, “dumb zone,” and degeneration
- Many report LLMs degrading in long chats: forgetting instructions, losing tool-calling discipline, emitting raw JSON, repeating earlier prompts, or failing to respond.
- Approaching context limits is described as a “dumb zone” or “decoherence” phase where role attribution, negation (“don’t do X”), and even basic behaviors break down.
- Compaction/summarization of context may worsen confusion about who said what.
Determinism, chaos, and guarantees
- Long subthread on determinism:
- At token level, models can be made reproducible (fixed seed, temp=0, careful hardware).
- But small prompt changes cause large, unpredictable semantic shifts; behavior is “chaotic” even if technically deterministic.
- Several argue you cannot deterministically guarantee output properties (e.g., “never do X”) in the way you can with parameterized SQL or traditional code.
Data vs. control, prompt injection, and security
- Core security concern: no hard architectural boundary between data and instructions; everything is just tokens in one stream.
- Prompt injection is likened more to social engineering than SQL injection; you can only mitigate, not eliminate it, without destroying general-purpose usefulness.
- Some argue LLMs should always be treated as untrusted users with limited, sandboxed permissions.
Proposed mitigations and design ideas
- Better role/speaker encoding: “colored” tokens, speaker embeddings, or separate input channels for system/user/tool.
- Stronger tool boundary enforcement: cryptographically constrained tool arguments, fine-grained permissions, and post-hoc filters.
- Shorter or frequently refreshed chats; explicit “handoff documents” before compaction; restarting sessions after big mistakes.
- Use LLMs as juniors: helpful but always supervised; never given unchecked access to critical systems.
Overall sentiment
- Mix of fascination with capability (especially for coding) and deep unease about reliability, safety, and marketing overreach.
- Consensus that current systems remain brittle, probabilistic tools, not robust autonomous agents.