Irrelevant facts about cats added to math problems increase LLM errors by 300%

Human Susceptibility to Irrelevant Information

  • The article asserts that humans would “ignore” non-contextual cat facts, but many commenters doubt this.
  • People recall real exams and interviews where irrelevant details did distract or mislead students, especially weaker test-takers or those trained to assume all details matter.
  • Others argue that the specific CatAttack style (a math question followed by “Fun/Interesting fact: …cats…”) is so obviously unrelated that most competent students would not triple their error rate, though they might slow down or feel confused.
  • Several insist this is an empirical question and criticize the paper for speculating about human performance without running a control group.

LLM Attention, Architecture, and Failure Modes

  • Discussion centers on the fact that transformers’ attention ideally focuses on relevant tokens, but training on internet text makes models treat almost everything as potentially meaningful.
  • Extra sentences perturb the model’s internal representations and “anchor” reasoning; models try to find a relationship between the math and the cat trivia instead of discarding it.
  • Some note alternative architectures (e.g., state-space models) already show different context-retrieval behavior and might react differently, but this is unresolved.
  • RLHF may exacerbate the issue by rewarding models for always producing a confident, helpful answer rather than saying “that part is irrelevant.”

Prompting, Context Quality, and Practical Use

  • Several commenters see this as evidence that prompts should be concise and on-topic: “here’s all my code, add this feature” may itself be a CatAttack-style scenario.
  • A common workaround idea: first ask the model to restate or extract only the relevant parts, then solve—though others point out this still requires world knowledge about what is “irrelevant.”
  • People report mixed empirical results: some got ChatGPT 4o wrong with a cat fact; others saw models answer correctly and then separately comment on the trivia; one user couldn’t reproduce failures with a smaller local Llama model.

Security, Evaluation, and Broader Implications

  • CatAttack is viewed as a structured prompt-injection / red-herring attack, similar to prior “red herring” studies; suggestions include adding noise during training and new “perturbed” benchmarks.
  • Potential uses mentioned: CAPTCHAs, confusing safety or spam filters, or stressing LLM-based customer support and agent systems that must handle long, messy context.
  • Several comments push back on “humans do this too” defenses: for high-stakes domains (finance, law, healthcare), LLMs being as distractible as students under exam stress is not an acceptable bar.