2025-12-02

Is 2026 next year?

LLM and Google AI Failures on a Trivial Date Question

Multiple models (Google’s AI Overview, ChatGPT, Claude Haiku, some open-source LLMs) give self-contradictory or flatly wrong answers to “Is 2026 next year?” despite being given the correct current year in context.
Some models initially say “no” then immediately explain reasoning that implies “yes,” or flip mid-answer.
Others answer correctly but only after an extra reasoning pass or with “extended thinking” enabled.

User Experience: Arguing with a Token Generator

Several comments describe the same pattern when correcting LLM errors:
- First, the model confidently deflects or reframes the user’s correction.
- Then, when quoted verbatim, it apologizes profusely and restates the user’s explanation at length.
This wastes context and tokens and makes the conversation unusable; users feel they must find “magic words” to force the model to simply fix the bug.
Some argue that asking an LLM to “explain why it was wrong” is misguided: it’s just generating new tokens, not introspecting on prior output.

Do LLMs ‘Think’ or Have Knowledge?

One side: LLMs lack critical thinking, logic, skepticism, self-reflection, common sense, and in-session learning; they are sophisticated text predictors, not reasoning agents.
Others counter that large models exhibit internal structures and behaviors suggestive of world knowledge and some form of intelligence, blurring lines with human cognition.
There is debate over definitions of “intelligence,” whether next-token prediction alone can solve novel problems, and how this differs from human mental models.

Reliability, Usefulness, and Scope

Some conclude these tools should not be trusted for anything that truly matters if they fail even basic date arithmetic.
Others (some sarcastically) claim AI is still an “industrial revolution” for productivity, though critics say its main safe use today is summarization or boilerplate generation.
Disagreement over whether LLMs genuinely solve novel problems (e.g., coding, math) versus just remixing common solutions.

Google Search, Brand, and Feedback Loop

Concern that Google is shipping weaker models in AI Overviews, harming perceived quality but accepted as “enshittification.”
Some now use search as an AI prompt interface despite poor accuracy.
Geographic variation: some regions see no AI Overview, only this HN thread as top result.
Worry that AI systems will now train on this noisy thread itself, amplifying confusion (a kind of “generation loss”).

Technical Explanations and Mitigations

One analysis: yes/no framing plus training data mostly from earlier years biases models toward wrong answers without explicit pre-reasoning.
Another view: the bug stems from conflicting signals between the model’s training-cutoff “world” and injected current-date/system prompts or search results.
Suggestions:
- Always inject the current date into the system prompt.
- Offload arithmetic/date logic to deterministic tools and force LLMs to call them instead of improvising.
- Avoid ambiguous language like “next Friday,” which even humans disagree on.

Broader Reflections

Some compare LLM reasoning to purely mechanical processes: impressive but not indicative of true understanding or consciousness.
Others ask how different this really is from human cognition, which may also be mechanistic, though humans add subjective experience.
The episode reinforces that disclaimers (“AI responses may include mistakes”) are not enough when fallible AI is placed at the top of critical interfaces like search.

Related topics