History LLMs: Models trained exclusively on pre-1913 texts
Concept: “Time‑Locked” Historical Models
- Models are trained only on pre‑dated corpora (e.g., up to 1913, then 1929, etc.), so they “don’t know how the story ends” (no WWI/WWII, Spanish flu, etc.).
- Many commenters find this compelling as a way to approximate conversations with people from a given era, without hindsight bias.
- Others note humans also lack perfect temporal separation of knowledge; both people and LLMs blur past and present.
Training, Style, and Technical Questions
- Pretraining: base model trained on all data up to 1900, then continued training on slices like 1900–1913 to induce a specific “viewpoint” year.
- Corpus is ~80B tokens (for a 4B‑parameter model), multilingual but mostly English, with newspapers, books, periodicals. Duplicates are kept so widely circulated texts weigh more.
- Chat behavior is added via supervised fine‑tuning with a custom prompt (“You are a person living in {cutoff}...”), using modern frontier models to generate examples.
- Some historians say the prose feels plausibly Victorian/Edwardian; others think it’s too modern and milquetoast compared to genuine texts, likely due to modern SFT style.
- Debate over whether this is just “autocomplete on steroids” vs a richer, emergent reasoning system; discussion of RLHF, loss surfaces, hallucinations, and analogies to human predictive cognition.
Uses, Experiments, and Research Ideas
- Proposed as a tool to explore changing norms/Overton windows (e.g., attitudes toward empire, women, homosexuality) decade by decade.
- Suggested experiments:
- Lead the model (pre‑Einstein / mid‑Einstein) toward relativity or early quantum mechanics, seeing if it can reconstruct ideas from contemporary evidence.
- Test genuine novelty by posing math Olympiad‑style problems or logic questions outside its training set.
- Use as a period‑bounded assistant for historians (better OCR/transcription, querying archival documents in era‑appropriate language).
- Compare models trained on different languages/cultures or eras (e.g., 1980 cutoff) to surface cultural differences.
Bias, Safety, and Access Controversy
- Authors emphasize that historical racism, antisemitism, misogyny, etc. will appear, by design, to study how such views were articulated.
- They plan a “responsible access framework” limiting broad public use to avoid misuse and reputational blowback.
- Many commenters criticize this as overcautious or “AI safety theater,” likening it to book banning; others argue the reputational and institutional risks are real.
- Some worry about contamination from post‑cutoff data and the opacity of what exactly the model represents; others question its trustworthiness for serious scholarship given hallucinations and black‑box behavior.