Why do LLMs freak out over the seahorse emoji?
Proposed mechanism
- The model can represent “seahorse emoji” internally, but there is no corresponding output token. The final decoding layer snaps to the nearest emoji (e.g., horse), creating a mismatch.
- This explains the repair loop: it asserts existence, tries to print it, produces the wrong emoji, then “notices” the inconsistency and spirals trying to fix it.
- Reinforcement or exposure to its own outputs may help it learn “this concept exists in latent space but can’t be emitted.”
Hallucination or not?
- One side: it is classic hallucination as soon as it says “Yes, it exists.”
- Other side: it’s a representational/decoding hole, not confident fabrication; akin to a “manifold gap” where semantically nearby tokens are incorrect.
- Some frame it as confabulation or “tip-of-the-tongue,” not sensory hallucination.
Generation dynamics and self-correction
- Transformers generate token-by-token with no built-in “backspace.” They often correct mid-stream because errors are already emitted.
- “Thinking modes” let models talk to themselves privately, reducing visible spirals. Attempts at backspace tokens exist, but aren’t mainstream.
- Debate on whether transformers can “plan ahead”: some evidence they pre-activate future rhyme/word candidates; others emphasize this is just learned circuits, not true internal revision.
Model and prompt variability
- Different models behave differently: some immediately say “no,” others spiral; enabling web search or extended thinking often resolves it.
- Language and phrasing matter (“show me” vs. “is there”), as do system prompts and calibration. Some locales/languages produced fewer failures.
Tokenization, knowledge, and “holes”
- Beyond tokenization, commenters note training data likely contains many claims that a seahorse emoji exists, biasing “Yes.”
- Emoji tasks require exact single-token accuracy; near neighbors aren’t good enough. Similar issues appear in letter counting and “glitch tokens.”
Fixes and mitigations (unclear which is best)
- Short term: system-prompt patches, web search, or explicit Q&A fine-tunes (“No, there isn’t”).
- Longer term: better handling of undefined tokens/”holes,” agentic second-pass verification, or adding a backspace/revision mechanism.
- Some suggest it exposes a fundamental limitation rather than a simple bug.
Human parallels and Mandela effect
- Many people also “remember” a seahorse emoji; threads cite pre-Unicode/custom emoji as possible sources of false memory.
- Analogies to paraphasia, conversational self-correction, and split-brain confabulation were noted.
Related triggers
- Similar behavior reported for other plausible-but-missing emojis (e.g., dragonfly, lemur, possum, windmill); specificity and prompt shape affect outcomes.