2026-06-23

Unlimited OCR: One-shot long-horizon parsing

State of OCR Today

Strong disagreement on whether OCR is “solved.”
Some argue traditional OCR is fast, cheap, and very reliable for many printed, simple documents.
Others say even in 2026 OCR “still sucks,” especially for complex layouts, tables, math, and messy real-world scans.
Non-Latin scripts (CJK, Arabic, Vietnamese, Thai) and cursive or historical handwriting are cited as especially challenging.

Traditional OCR vs LLM/VLM-based OCR

Traditional OCR:
- Good at character-level detection, hand-filled forms, fixed layouts, and offline CPU use.
- Often struggles with layout understanding (multi-columns, headers/footers, ads, irregular structures).
LLM/VLM-based OCR:
- Better at handling diverse scripts, cursive, mixed-language content, and complex layouts.
- Can leverage language priors to fix noisy input, but raises concerns about hallucinations and silent corrections.
- Commercial cloud offerings (e.g., major cloud OCR APIs) rated around “85%” accuracy, expensive, and with differing failure modes.

Unlimited OCR & Long-Horizon Attention

KV cache growth is a major bottleneck for long documents; naïve approaches require per-page chunking.
Unlimited OCR uses Reference Sliding Window Attention:
- Global: always attends to the full document image.
- Local: only keeps a small moving window of its own generated text.
This aims to maintain document-wide context without O(N) memory growth, promising for long PDFs and local deployment.
Some question whether the local window is too small for very long or token-heavy inputs.

Use Cases, Tools, and Benchmarks

Users mention using a variety of tools (marker, Mistral OCR, Claude, cloud services, document-parsing frameworks) with mixed results.
For long, complex technical documents (standards, datasheets, scientific PDFs), structure-aware chunking and multi-page context are seen as crucial.
Comparisons to other SOTA OCR parsers and benchmarks are requested; current standing of Unlimited OCR vs leading models is unclear.

Reliability, Hallucinations, and Trust

Concern that context-aware models “correct” text (names, foreign words) or translate when they should just transcribe, which can be unacceptable for archival or legal use.
Some propose expensive verification loops (regenerating images from text and visually comparing) to approach near-100% reliability.

Open Source and Strategy

Open-sourcing by large companies is seen as driven by a mix of ideals, reputation, hiring, ecosystem-building, and potential strategic impact on competitors.
Some view broad release of strong OCR models from China as possibly weakening revenue of Western AI labs.

Related topics