A new Google model is nearly perfect on automated handwriting recognition
Historical & practical use cases
- Several commenters are excited about strong handwriting recognition, especially for:
- 16th–18th century archival material (Conquistador accounts, colonial Spanish files, ledgers, local town records).
- Genealogy, Renaissance Neo-Latin texts, family diaries, and children’s handwriting.
- People describe current LLMs (Gemini 2.5 Pro/Flash, Claude, o3) already being very useful for:
- Transcribing handwritten notes and food logs with few errors.
- Searching, summarizing, and translating scanned historical documents.
- Acting as research assistants via custom tooling and agents.
Skepticism about OS clones and “wild capabilities”
- Many doubt claims that the model “codes full Windows/Apple OSes, 3D software, emulators” from one prompt:
- Most likely outputs are web-based UI clones (HTML/CSS/JS) that resemble OS desktops, not kernels.
- With abundant open-source OSes and emulators on GitHub, such results may be remixing or near-copying, not deep novelty.
- Some see this as classic social-media hype and suspect astroturfing and engagement farming around new model launches.
Novelty, reasoning, and “stochastic parrots”
- Long debate over whether LLMs:
- Only interpolate from training data vs. genuinely extrapolate and create novel solutions.
- Are “just next-token predictors” vs. systems that necessarily build internal world models to predict well.
- Examples used on the “they reason” side:
- Math Olympiad-style problem solving.
- Material-physics intuitions (“can X cut through Y?”).
- Multi-document code or research synthesis.
- Critics respond that:
- Impressive feats often align with dense training coverage (e.g., NES emulators, sugar loaves, ledgers).
- There are no clear signs yet of breakthroughs comparable to relativity or the transistor.
Handwriting example and trust issues
- The sugar-loaf ledger case that impressed the author is heavily debated:
- Alternatives: the model may have simply seen the space (“14 5”), recognized period notation, or drawn on prior examples of typical loaf weights.
- Regardless, it violated the explicit instruction to transcribe “exactly as written,” which some see as a reliability red flag.
- Historians worry about:
- Being subtly biased by AI “guesses” in ambiguous passages.
- Using models on primary sources without strong provenance and error-characterization.
Concerns about hype, regressions, and evaluation
- Many find the article hyperbolic, with marketing-style language about “emergent abstract reasoning.”
- Several users report:
- Earlier Gemini 2.5 Pro previews feeling stronger than later releases, possibly due to cost optimizations.
- Models that once worked well for research later hallucinating sources or references.
- There’s interest in standardized handwriting benchmarks; some are surprised none are widely cited.