A new Google model is nearly perfect on automated handwriting recognition

Historical & practical use cases

  • Several commenters are excited about strong handwriting recognition, especially for:
    • 16th–18th century archival material (Conquistador accounts, colonial Spanish files, ledgers, local town records).
    • Genealogy, Renaissance Neo-Latin texts, family diaries, and children’s handwriting.
  • People describe current LLMs (Gemini 2.5 Pro/Flash, Claude, o3) already being very useful for:
    • Transcribing handwritten notes and food logs with few errors.
    • Searching, summarizing, and translating scanned historical documents.
    • Acting as research assistants via custom tooling and agents.

Skepticism about OS clones and “wild capabilities”

  • Many doubt claims that the model “codes full Windows/Apple OSes, 3D software, emulators” from one prompt:
    • Most likely outputs are web-based UI clones (HTML/CSS/JS) that resemble OS desktops, not kernels.
    • With abundant open-source OSes and emulators on GitHub, such results may be remixing or near-copying, not deep novelty.
  • Some see this as classic social-media hype and suspect astroturfing and engagement farming around new model launches.

Novelty, reasoning, and “stochastic parrots”

  • Long debate over whether LLMs:
    • Only interpolate from training data vs. genuinely extrapolate and create novel solutions.
    • Are “just next-token predictors” vs. systems that necessarily build internal world models to predict well.
  • Examples used on the “they reason” side:
    • Math Olympiad-style problem solving.
    • Material-physics intuitions (“can X cut through Y?”).
    • Multi-document code or research synthesis.
  • Critics respond that:
    • Impressive feats often align with dense training coverage (e.g., NES emulators, sugar loaves, ledgers).
    • There are no clear signs yet of breakthroughs comparable to relativity or the transistor.

Handwriting example and trust issues

  • The sugar-loaf ledger case that impressed the author is heavily debated:
    • Alternatives: the model may have simply seen the space (“14 5”), recognized period notation, or drawn on prior examples of typical loaf weights.
    • Regardless, it violated the explicit instruction to transcribe “exactly as written,” which some see as a reliability red flag.
  • Historians worry about:
    • Being subtly biased by AI “guesses” in ambiguous passages.
    • Using models on primary sources without strong provenance and error-characterization.

Concerns about hype, regressions, and evaluation

  • Many find the article hyperbolic, with marketing-style language about “emergent abstract reasoning.”
  • Several users report:
    • Earlier Gemini 2.5 Pro previews feeling stronger than later releases, possibly due to cost optimizations.
    • Models that once worked well for research later hallucinating sources or references.
  • There’s interest in standardized handwriting benchmarks; some are surprised none are widely cited.