2025-01-22

Using generative AI as part of historical research: three case studies

LLMs as Historical Research Tools

Many commenters like the article’s concrete case studies and “layered” testing (OCR, translation, interpretation).
Several see strong potential: rapid transcription of difficult manuscripts, first-pass translations, and surfacing possibly relevant secondary sources.
Some working with Neo-Latin, German, and early modern texts report good but imperfect translations, especially when experts can validate samples and estimate error rates.
Others note that a large share of historical work is reinterpretation of known material, where LLMs could function as powerful research assistants.

Trust, Expertise, and Hallucinations

Persistent worry: non‑experts cannot reliably judge when an LLM is wrong, especially on nuanced historical questions.
Experienced users say LLMs are very useful within domains where they already have deep knowledge, but not for evaluating “PhD‑level” work in unfamiliar fields.
Suggested mitigations include: cross‑checking multiple models, keeping context short, asking for references and verifying them, RAG/search integration, and designing tools that highlight disagreement.
Others argue this still fails novices: if you’re not already expert, you don’t know when to backtrack.

Impact on Humanities and Education

Some fear LLMs will be used to justify cutting funding for history/humanities (“80% of a historian for a few chat queries”).
Others think education can adapt, with LLMs as accelerators for learning if critical thinking and source literacy are emphasized.

OCR, Translation, and Existing Tools

Debate over whether LLM-based OCR/translation is truly better than specialized tools (e.g., Transkribus, DeepL, Google Translate); critics note the article lacked systematic comparisons.
Supporters counter that existing OCR struggles badly with early modern handwriting and that LLMs can handle at least “intermediate” paleography, dramatically speeding triage.

Bias, Consensus, and Rewriting History

LLMs are described as “consensus distillation” or “median viewpoint” machines, which risks reproducing popular myths and institutional PR.
Concern that centralized, opaque training and RLHF could make them tools for subtly rewriting history; others argue multiple competing models will make coordinated rewriting harder.

Creativity, Intelligence, and Art

Long subthread debates whether LLMs show genuine creativity or just sophisticated remixing.
Some compare them to cameras or instruments: value lies in the human using them; others insist lack of lived experience makes LLM‑generated literature/poetry inherently hollow.