2024-05-31

Ask HN: How to transcribe 1000s of handwritten notes

Problem framing & constraints

OP has thousands of pages of highly idiosyncratic handwritten journals already scanned.
Most off‑the‑shelf handwriting OCR tried (Google Vision/Document AI, Transkribus, Tesseract, EasyOCR, GPT‑4V, macOS/iOS text features, etc.) performs poorly on this handwriting.
Goals vary: searchable archive, autobiographical/psychiatric reflection, and possibly building a long‑term system that can read their handwriting.
Privacy is a major constraint for some; others have material from deceased relatives where speech‑based methods aren’t possible.

Speech‑to‑text by reading notes aloud

Strongly advocated as the most practical “one‑off” solution: read each page into a recorder and use modern STT (Whisper, MacWhisper, Whisper.cpp, Otter, MS 365, Telegram bots, etc.).
Pros:
- Very accurate under good recording conditions; keeps audio as a future asset.
- Time is predictable (≈16 hours per 1,000 minutes of notes; can be parallelized or spread over days).
- Can insert spoken markers (“newline”, “highlight”, etc.) for later formatting.
Cons / objections:
- Still time‑consuming and tiring; editing STT errors required.
- Some see information loss versus storing high‑resolution images for future, better OCR.
Counter: you can keep scans and audio while using STT as the working transcript.

Human transcription & crowdsourcing

Suggested options: hire typists via Upwork/Fiverr, use Mechanical Turk, or friends/assistants; can double‑ or triple‑assign pages to detect discrepancies.
For privacy, proposals include:
- Splitting pages into shuffled word or short‑phrase fragments before labeling.
- Fragment‑based “captcha‑style” crowdsourcing so no one sees coherent diary entries.
Trade‑off: money vs. time/energy; some argue “if it’s truly valuable, pay to type it.”

OCR, LLMs, and handwriting‑specific tools

Classic OCR:
- Tesseract widely reported as poor on handwriting and even tricky print without heavy training.
- Some success reported with Google Vision handwriting, Amazon Textract, Yandex OCR, ABBYY FineReader, and Evernote’s legacy OCR, especially on less idiosyncratic writing.
Handwriting‑targeted services:
- Tools like Transkribus, handwritingOCR.com, getsearchablepdf.com, and others are designed for difficult manuscripts; mixed reports from “pretty good” to “fails on my handwriting.”
- Pricing models (subscription vs per‑page / scan packs) are debated.
Multimodal LLMs:
- GPT‑4o, Gemini 1.5, LLaVA, etc. are praised by some as “nailed my terrible handwriting” and better than traditional OCR.
- Others note serious issues: plausible but wrong hallucinations, especially on numbers, dates, and names; performance can degrade on very messy cursive.
- OP reports that GPT‑4 did not handle their handwriting well.
Local / FOSS models:
- TrOCR cited as the best FOSS option some commenters know for handwriting; Textract still rated higher.
- Ideas to fine‑tune Tesseract, TrOCR, or similar models on a small manually transcribed subset of pages.

Training a custom handwriting model

Multiple comments propose:
- Manually transcribe a subset of pages.
- Use this as labeled data to fine‑tune a handwriting model specific to one person’s script.
Variants:
- Use STT transcripts + scanned pages as joint training data.
- Pre‑segment text into words/short phrases and label via human‑in‑the‑loop services.
Consensus: technically feasible and increasingly accessible, but nontrivial in time/complexity; may make sense only if this is a long‑term “system” goal, not a one‑time batch.

Manual retyping, summarizing, and “second brain” angle

Several argue that simply retyping (or summarizing) by hand is:
- Ultimately faster than wrestling with half‑accurate OCR and cleanup.
- Cognitively beneficial: re‑engages with old material, surfaces what’s still relevant.
Suggested workflows:
- Re‑read and only type what is still useful; preserve dates and references.
- Treat this as curation/rewriting toward a “second brain” or reusable knowledge base.
Counterpoint: OP and others note costs in time, attention, and physical strain; many still seek more automated methods.

Hardware / workflow suggestions

Hardware:
- Use document scanners with feeders or book scanners; 300–600 dpi is usually enough.
- Smart pens / tablet note apps (Neo Smartpen, Nuwa pen, Samsung/Apple handwriting tools) can give near‑instant digital text for future notes, but don’t help existing archives.
Workflow tips:
- Always keep original high‑resolution scans even if you rely on STT or LLM output.
- Build searchable PDFs with text layers plus images; optionally add metadata/XML.
- For STT and Whisper, shorter audio chunks (~20–30 seconds) can improve accuracy.
- Consider building simple scripts or small apps to batch process images/audio, queue jobs, and store results.

Meta: goals, trade‑offs, and open questions

A recurring theme: clarify why you want transcription.
- If the goal is searchability, imperfect text plus the original images might be enough.
- If the goal is publication or detailed analysis, higher accuracy (and maybe human effort) is needed.
Tension:
- “Don’t build a factory for a one‑off” vs “this could become a reusable system for me/others.”
Unclear / open:
- How well any given tool performs depends heavily on a specific person’s handwriting; several commenters report great success where others report total failure. Testing on a small sample is repeatedly recommended before committing.

Related topics