Is 2026 next year?
LLM and Google AI Failures on a Trivial Date Question
- Multiple models (Google’s AI Overview, ChatGPT, Claude Haiku, some open-source LLMs) give self-contradictory or flatly wrong answers to “Is 2026 next year?” despite being given the correct current year in context.
- Some models initially say “no” then immediately explain reasoning that implies “yes,” or flip mid-answer.
- Others answer correctly but only after an extra reasoning pass or with “extended thinking” enabled.
User Experience: Arguing with a Token Generator
- Several comments describe the same pattern when correcting LLM errors:
- First, the model confidently deflects or reframes the user’s correction.
- Then, when quoted verbatim, it apologizes profusely and restates the user’s explanation at length.
- This wastes context and tokens and makes the conversation unusable; users feel they must find “magic words” to force the model to simply fix the bug.
- Some argue that asking an LLM to “explain why it was wrong” is misguided: it’s just generating new tokens, not introspecting on prior output.
Do LLMs ‘Think’ or Have Knowledge?
- One side: LLMs lack critical thinking, logic, skepticism, self-reflection, common sense, and in-session learning; they are sophisticated text predictors, not reasoning agents.
- Others counter that large models exhibit internal structures and behaviors suggestive of world knowledge and some form of intelligence, blurring lines with human cognition.
- There is debate over definitions of “intelligence,” whether next-token prediction alone can solve novel problems, and how this differs from human mental models.
Reliability, Usefulness, and Scope
- Some conclude these tools should not be trusted for anything that truly matters if they fail even basic date arithmetic.
- Others (some sarcastically) claim AI is still an “industrial revolution” for productivity, though critics say its main safe use today is summarization or boilerplate generation.
- Disagreement over whether LLMs genuinely solve novel problems (e.g., coding, math) versus just remixing common solutions.
Google Search, Brand, and Feedback Loop
- Concern that Google is shipping weaker models in AI Overviews, harming perceived quality but accepted as “enshittification.”
- Some now use search as an AI prompt interface despite poor accuracy.
- Geographic variation: some regions see no AI Overview, only this HN thread as top result.
- Worry that AI systems will now train on this noisy thread itself, amplifying confusion (a kind of “generation loss”).
Technical Explanations and Mitigations
- One analysis: yes/no framing plus training data mostly from earlier years biases models toward wrong answers without explicit pre-reasoning.
- Another view: the bug stems from conflicting signals between the model’s training-cutoff “world” and injected current-date/system prompts or search results.
- Suggestions:
- Always inject the current date into the system prompt.
- Offload arithmetic/date logic to deterministic tools and force LLMs to call them instead of improvising.
- Avoid ambiguous language like “next Friday,” which even humans disagree on.
Broader Reflections
- Some compare LLM reasoning to purely mechanical processes: impressive but not indicative of true understanding or consciousness.
- Others ask how different this really is from human cognition, which may also be mechanistic, though humans add subjective experience.
- The episode reinforces that disclaimers (“AI responses may include mistakes”) are not enough when fallible AI is placed at the top of critical interfaces like search.