Using generative AI as part of historical research: three case studies

LLMs as Historical Research Tools

  • Many commenters like the article’s concrete case studies and “layered” testing (OCR, translation, interpretation).
  • Several see strong potential: rapid transcription of difficult manuscripts, first-pass translations, and surfacing possibly relevant secondary sources.
  • Some working with Neo-Latin, German, and early modern texts report good but imperfect translations, especially when experts can validate samples and estimate error rates.
  • Others note that a large share of historical work is reinterpretation of known material, where LLMs could function as powerful research assistants.

Trust, Expertise, and Hallucinations

  • Persistent worry: non‑experts cannot reliably judge when an LLM is wrong, especially on nuanced historical questions.
  • Experienced users say LLMs are very useful within domains where they already have deep knowledge, but not for evaluating “PhD‑level” work in unfamiliar fields.
  • Suggested mitigations include: cross‑checking multiple models, keeping context short, asking for references and verifying them, RAG/search integration, and designing tools that highlight disagreement.
  • Others argue this still fails novices: if you’re not already expert, you don’t know when to backtrack.

Impact on Humanities and Education

  • Some fear LLMs will be used to justify cutting funding for history/humanities (“80% of a historian for a few chat queries”).
  • Others think education can adapt, with LLMs as accelerators for learning if critical thinking and source literacy are emphasized.

OCR, Translation, and Existing Tools

  • Debate over whether LLM-based OCR/translation is truly better than specialized tools (e.g., Transkribus, DeepL, Google Translate); critics note the article lacked systematic comparisons.
  • Supporters counter that existing OCR struggles badly with early modern handwriting and that LLMs can handle at least “intermediate” paleography, dramatically speeding triage.

Bias, Consensus, and Rewriting History

  • LLMs are described as “consensus distillation” or “median viewpoint” machines, which risks reproducing popular myths and institutional PR.
  • Concern that centralized, opaque training and RLHF could make them tools for subtly rewriting history; others argue multiple competing models will make coordinated rewriting harder.

Creativity, Intelligence, and Art

  • Long subthread debates whether LLMs show genuine creativity or just sophisticated remixing.
  • Some compare them to cameras or instruments: value lies in the human using them; others insist lack of lived experience makes LLM‑generated literature/poetry inherently hollow.