Mistral OCR 4
Overall impressions of Mistral OCR 4
- Many commenters report strong real-world performance, especially on degraded or old documents, comparing it favorably to classic tools like ABBYY FineReader and some modern VLMs.
- Others are skeptical due to earlier Mistral OCR versions underperforming relative to marketing claims; some say OCR 4 looks better but want independent benchmarks first.
- Some users praise Mistral specifically for OCR while criticizing its coding/general models as weaker than US/Chinese SOTA.
Benchmarks, accuracy & evaluation
- People question the heavy reliance on internal benchmarks and limited public metrics; concerns about past “98% accurate on tiny internal sets.”
- External benchmarks like OlmOCRBench, OmniDocBench, ParseBench, and Arbitr leaderboards are referenced; one link suggests previous Mistral OCR wasn’t top-tier.
- Several complain about “chart crimes”: truncated y‑axes and presentation that may exaggerate gains.
- There’s interest in comparisons vs Baidu’s Unlimited-OCR, Llama Parse, Apple’s local models, Claude’s vision, Gemini, and Google Vision / Document AI, but data is incomplete or absent.
Pricing and competition
- $4 per 1,000 pages is seen as very cheap by some, but others note Google Vision OCR is cheaper for plain text ($1.50/1k) and that layout-aware Google/ Azure offerings are closer in price.
- Some wonder how traditional OCR vendors can compete at these price points.
Use cases, limitations & risks
- Reported good results on complex business docs, tables, forms, and magazines; one mention of automatic markdown + image cropping being particularly useful.
- Some real-world failures are noted (e.g., misrecognized dates on receipts, quotation mark style changes), highlighting risk for high‑stakes or formatting‑sensitive workflows.
- Discussion on using OCR outputs in downstream decision systems; concern about silent OCR errors affecting financial or other critical decisions.
Handwriting, languages & edge cases
- Multiple comments confirm good handwriting recognition in practice (including historical documents), though always with a human review tail.
- Other tools like Transkribus, Sarvam, Gemini Pro, and Qwen models are cited as strong for handwriting or Indic languages.
- One user reports language misclassification (Malayalam as Kannada); another notes “rare/specialized languages” labeling (formerly “minor”) as revealing of training priorities.
- Some ask for benchmarks by language and on handwritten data; current public benchmarks are seen as skewed toward printed text.