2025-07-29

Irrelevant facts about cats added to math problems increase LLM errors by 300%

Human Susceptibility to Irrelevant Information

The article asserts that humans would “ignore” non-contextual cat facts, but many commenters doubt this.
People recall real exams and interviews where irrelevant details did distract or mislead students, especially weaker test-takers or those trained to assume all details matter.
Others argue that the specific CatAttack style (a math question followed by “Fun/Interesting fact: …cats…”) is so obviously unrelated that most competent students would not triple their error rate, though they might slow down or feel confused.
Several insist this is an empirical question and criticize the paper for speculating about human performance without running a control group.

LLM Attention, Architecture, and Failure Modes

Discussion centers on the fact that transformers’ attention ideally focuses on relevant tokens, but training on internet text makes models treat almost everything as potentially meaningful.
Extra sentences perturb the model’s internal representations and “anchor” reasoning; models try to find a relationship between the math and the cat trivia instead of discarding it.
Some note alternative architectures (e.g., state-space models) already show different context-retrieval behavior and might react differently, but this is unresolved.
RLHF may exacerbate the issue by rewarding models for always producing a confident, helpful answer rather than saying “that part is irrelevant.”

Prompting, Context Quality, and Practical Use

Several commenters see this as evidence that prompts should be concise and on-topic: “here’s all my code, add this feature” may itself be a CatAttack-style scenario.
A common workaround idea: first ask the model to restate or extract only the relevant parts, then solve—though others point out this still requires world knowledge about what is “irrelevant.”
People report mixed empirical results: some got ChatGPT 4o wrong with a cat fact; others saw models answer correctly and then separately comment on the trivia; one user couldn’t reproduce failures with a smaller local Llama model.

Security, Evaluation, and Broader Implications

CatAttack is viewed as a structured prompt-injection / red-herring attack, similar to prior “red herring” studies; suggestions include adding noise during training and new “perturbed” benchmarks.
Potential uses mentioned: CAPTCHAs, confusing safety or spam filters, or stressing LLM-based customer support and agent systems that must handle long, messy context.
Several comments push back on “humans do this too” defenses: for high-stakes domains (finance, law, healthcare), LLMs being as distractible as students under exam stress is not an acceptable bar.

Related topics