Ask HN: How to transcribe 1000s of handwritten notes

Problem framing & constraints

  • OP has thousands of pages of highly idiosyncratic handwritten journals already scanned.
  • Most off‑the‑shelf handwriting OCR tried (Google Vision/Document AI, Transkribus, Tesseract, EasyOCR, GPT‑4V, macOS/iOS text features, etc.) performs poorly on this handwriting.
  • Goals vary: searchable archive, autobiographical/psychiatric reflection, and possibly building a long‑term system that can read their handwriting.
  • Privacy is a major constraint for some; others have material from deceased relatives where speech‑based methods aren’t possible.

Speech‑to‑text by reading notes aloud

  • Strongly advocated as the most practical “one‑off” solution: read each page into a recorder and use modern STT (Whisper, MacWhisper, Whisper.cpp, Otter, MS 365, Telegram bots, etc.).
  • Pros:
    • Very accurate under good recording conditions; keeps audio as a future asset.
    • Time is predictable (≈16 hours per 1,000 minutes of notes; can be parallelized or spread over days).
    • Can insert spoken markers (“newline”, “highlight”, etc.) for later formatting.
  • Cons / objections:
    • Still time‑consuming and tiring; editing STT errors required.
    • Some see information loss versus storing high‑resolution images for future, better OCR.
  • Counter: you can keep scans and audio while using STT as the working transcript.

Human transcription & crowdsourcing

  • Suggested options: hire typists via Upwork/Fiverr, use Mechanical Turk, or friends/assistants; can double‑ or triple‑assign pages to detect discrepancies.
  • For privacy, proposals include:
    • Splitting pages into shuffled word or short‑phrase fragments before labeling.
    • Fragment‑based “captcha‑style” crowdsourcing so no one sees coherent diary entries.
  • Trade‑off: money vs. time/energy; some argue “if it’s truly valuable, pay to type it.”

OCR, LLMs, and handwriting‑specific tools

  • Classic OCR:
    • Tesseract widely reported as poor on handwriting and even tricky print without heavy training.
    • Some success reported with Google Vision handwriting, Amazon Textract, Yandex OCR, ABBYY FineReader, and Evernote’s legacy OCR, especially on less idiosyncratic writing.
  • Handwriting‑targeted services:
    • Tools like Transkribus, handwritingOCR.com, getsearchablepdf.com, and others are designed for difficult manuscripts; mixed reports from “pretty good” to “fails on my handwriting.”
    • Pricing models (subscription vs per‑page / scan packs) are debated.
  • Multimodal LLMs:
    • GPT‑4o, Gemini 1.5, LLaVA, etc. are praised by some as “nailed my terrible handwriting” and better than traditional OCR.
    • Others note serious issues: plausible but wrong hallucinations, especially on numbers, dates, and names; performance can degrade on very messy cursive.
    • OP reports that GPT‑4 did not handle their handwriting well.
  • Local / FOSS models:
    • TrOCR cited as the best FOSS option some commenters know for handwriting; Textract still rated higher.
    • Ideas to fine‑tune Tesseract, TrOCR, or similar models on a small manually transcribed subset of pages.

Training a custom handwriting model

  • Multiple comments propose:
    • Manually transcribe a subset of pages.
    • Use this as labeled data to fine‑tune a handwriting model specific to one person’s script.
  • Variants:
    • Use STT transcripts + scanned pages as joint training data.
    • Pre‑segment text into words/short phrases and label via human‑in‑the‑loop services.
  • Consensus: technically feasible and increasingly accessible, but nontrivial in time/complexity; may make sense only if this is a long‑term “system” goal, not a one‑time batch.

Manual retyping, summarizing, and “second brain” angle

  • Several argue that simply retyping (or summarizing) by hand is:
    • Ultimately faster than wrestling with half‑accurate OCR and cleanup.
    • Cognitively beneficial: re‑engages with old material, surfaces what’s still relevant.
  • Suggested workflows:
    • Re‑read and only type what is still useful; preserve dates and references.
    • Treat this as curation/rewriting toward a “second brain” or reusable knowledge base.
  • Counterpoint: OP and others note costs in time, attention, and physical strain; many still seek more automated methods.

Hardware / workflow suggestions

  • Hardware:
    • Use document scanners with feeders or book scanners; 300–600 dpi is usually enough.
    • Smart pens / tablet note apps (Neo Smartpen, Nuwa pen, Samsung/Apple handwriting tools) can give near‑instant digital text for future notes, but don’t help existing archives.
  • Workflow tips:
    • Always keep original high‑resolution scans even if you rely on STT or LLM output.
    • Build searchable PDFs with text layers plus images; optionally add metadata/XML.
    • For STT and Whisper, shorter audio chunks (~20–30 seconds) can improve accuracy.
    • Consider building simple scripts or small apps to batch process images/audio, queue jobs, and store results.

Meta: goals, trade‑offs, and open questions

  • A recurring theme: clarify why you want transcription.
    • If the goal is searchability, imperfect text plus the original images might be enough.
    • If the goal is publication or detailed analysis, higher accuracy (and maybe human effort) is needed.
  • Tension:
    • “Don’t build a factory for a one‑off” vs “this could become a reusable system for me/others.”
  • Unclear / open:
    • How well any given tool performs depends heavily on a specific person’s handwriting; several commenters report great success where others report total failure. Testing on a small sample is repeatedly recommended before committing.