History LLMs: Models trained exclusively on pre-1913 texts

Concept: “Time‑Locked” Historical Models

  • Models are trained only on pre‑dated corpora (e.g., up to 1913, then 1929, etc.), so they “don’t know how the story ends” (no WWI/WWII, Spanish flu, etc.).
  • Many commenters find this compelling as a way to approximate conversations with people from a given era, without hindsight bias.
  • Others note humans also lack perfect temporal separation of knowledge; both people and LLMs blur past and present.

Training, Style, and Technical Questions

  • Pretraining: base model trained on all data up to 1900, then continued training on slices like 1900–1913 to induce a specific “viewpoint” year.
  • Corpus is ~80B tokens (for a 4B‑parameter model), multilingual but mostly English, with newspapers, books, periodicals. Duplicates are kept so widely circulated texts weigh more.
  • Chat behavior is added via supervised fine‑tuning with a custom prompt (“You are a person living in {cutoff}...”), using modern frontier models to generate examples.
  • Some historians say the prose feels plausibly Victorian/Edwardian; others think it’s too modern and milquetoast compared to genuine texts, likely due to modern SFT style.
  • Debate over whether this is just “autocomplete on steroids” vs a richer, emergent reasoning system; discussion of RLHF, loss surfaces, hallucinations, and analogies to human predictive cognition.

Uses, Experiments, and Research Ideas

  • Proposed as a tool to explore changing norms/Overton windows (e.g., attitudes toward empire, women, homosexuality) decade by decade.
  • Suggested experiments:
    • Lead the model (pre‑Einstein / mid‑Einstein) toward relativity or early quantum mechanics, seeing if it can reconstruct ideas from contemporary evidence.
    • Test genuine novelty by posing math Olympiad‑style problems or logic questions outside its training set.
    • Use as a period‑bounded assistant for historians (better OCR/transcription, querying archival documents in era‑appropriate language).
    • Compare models trained on different languages/cultures or eras (e.g., 1980 cutoff) to surface cultural differences.

Bias, Safety, and Access Controversy

  • Authors emphasize that historical racism, antisemitism, misogyny, etc. will appear, by design, to study how such views were articulated.
  • They plan a “responsible access framework” limiting broad public use to avoid misuse and reputational blowback.
  • Many commenters criticize this as overcautious or “AI safety theater,” likening it to book banning; others argue the reputational and institutional risks are real.
  • Some worry about contamination from post‑cutoff data and the opacity of what exactly the model represents; others question its trustworthiness for serious scholarship given hallucinations and black‑box behavior.