2025-12-18

History LLMs: Models trained exclusively on pre-1913 texts

Concept: “Time‑Locked” Historical Models

Models are trained only on pre‑dated corpora (e.g., up to 1913, then 1929, etc.), so they “don’t know how the story ends” (no WWI/WWII, Spanish flu, etc.).
Many commenters find this compelling as a way to approximate conversations with people from a given era, without hindsight bias.
Others note humans also lack perfect temporal separation of knowledge; both people and LLMs blur past and present.

Training, Style, and Technical Questions

Pretraining: base model trained on all data up to 1900, then continued training on slices like 1900–1913 to induce a specific “viewpoint” year.
Corpus is ~80B tokens (for a 4B‑parameter model), multilingual but mostly English, with newspapers, books, periodicals. Duplicates are kept so widely circulated texts weigh more.
Chat behavior is added via supervised fine‑tuning with a custom prompt (“You are a person living in {cutoff}...”), using modern frontier models to generate examples.
Some historians say the prose feels plausibly Victorian/Edwardian; others think it’s too modern and milquetoast compared to genuine texts, likely due to modern SFT style.
Debate over whether this is just “autocomplete on steroids” vs a richer, emergent reasoning system; discussion of RLHF, loss surfaces, hallucinations, and analogies to human predictive cognition.

Uses, Experiments, and Research Ideas

Proposed as a tool to explore changing norms/Overton windows (e.g., attitudes toward empire, women, homosexuality) decade by decade.
Suggested experiments:
- Lead the model (pre‑Einstein / mid‑Einstein) toward relativity or early quantum mechanics, seeing if it can reconstruct ideas from contemporary evidence.
- Test genuine novelty by posing math Olympiad‑style problems or logic questions outside its training set.
- Use as a period‑bounded assistant for historians (better OCR/transcription, querying archival documents in era‑appropriate language).
- Compare models trained on different languages/cultures or eras (e.g., 1980 cutoff) to surface cultural differences.

Bias, Safety, and Access Controversy

Authors emphasize that historical racism, antisemitism, misogyny, etc. will appear, by design, to study how such views were articulated.
They plan a “responsible access framework” limiting broad public use to avoid misuse and reputational blowback.
Many commenters criticize this as overcautious or “AI safety theater,” likening it to book banning; others argue the reputational and institutional risks are real.
Some worry about contamination from post‑cutoff data and the opacity of what exactly the model represents; others question its trustworthiness for serious scholarship given hallucinations and black‑box behavior.

Related topics