Unlimited OCR: One-shot long-horizon parsing

State of OCR Today

  • Strong disagreement on whether OCR is “solved.”
  • Some argue traditional OCR is fast, cheap, and very reliable for many printed, simple documents.
  • Others say even in 2026 OCR “still sucks,” especially for complex layouts, tables, math, and messy real-world scans.
  • Non-Latin scripts (CJK, Arabic, Vietnamese, Thai) and cursive or historical handwriting are cited as especially challenging.

Traditional OCR vs LLM/VLM-based OCR

  • Traditional OCR:
    • Good at character-level detection, hand-filled forms, fixed layouts, and offline CPU use.
    • Often struggles with layout understanding (multi-columns, headers/footers, ads, irregular structures).
  • LLM/VLM-based OCR:
    • Better at handling diverse scripts, cursive, mixed-language content, and complex layouts.
    • Can leverage language priors to fix noisy input, but raises concerns about hallucinations and silent corrections.
    • Commercial cloud offerings (e.g., major cloud OCR APIs) rated around “85%” accuracy, expensive, and with differing failure modes.

Unlimited OCR & Long-Horizon Attention

  • KV cache growth is a major bottleneck for long documents; naïve approaches require per-page chunking.
  • Unlimited OCR uses Reference Sliding Window Attention:
    • Global: always attends to the full document image.
    • Local: only keeps a small moving window of its own generated text.
  • This aims to maintain document-wide context without O(N) memory growth, promising for long PDFs and local deployment.
  • Some question whether the local window is too small for very long or token-heavy inputs.

Use Cases, Tools, and Benchmarks

  • Users mention using a variety of tools (marker, Mistral OCR, Claude, cloud services, document-parsing frameworks) with mixed results.
  • For long, complex technical documents (standards, datasheets, scientific PDFs), structure-aware chunking and multi-page context are seen as crucial.
  • Comparisons to other SOTA OCR parsers and benchmarks are requested; current standing of Unlimited OCR vs leading models is unclear.

Reliability, Hallucinations, and Trust

  • Concern that context-aware models “correct” text (names, foreign words) or translate when they should just transcribe, which can be unacceptable for archival or legal use.
  • Some propose expensive verification loops (regenerating images from text and visually comparing) to approach near-100% reliability.

Open Source and Strategy

  • Open-sourcing by large companies is seen as driven by a mix of ideals, reputation, hiring, ecosystem-building, and potential strategic impact on competitors.
  • Some view broad release of strong OCR models from China as possibly weakening revenue of Western AI labs.