Britannica11.org – a structured edition of the 1911 Encyclopædia Britannica

Project & Goals

  • Creator rebuilt the 1911 Encyclopædia Britannica into a structured, navigable site with ~37k articles, section navigation, cross-references, contributor index, page references, and links to original scans.
  • Aim is to preserve the feel of the original while making it actually usable and searchable.

Licensing, Data Access, and Reuse

  • Underlying 1911 text is public domain.
  • The site’s structured reconstruction (parsing, linking, indexing) is new work; no formal license yet.
  • Casual/small-scale use is welcomed; for bulk use (datasets, training, redistribution), the creator prefers people get in touch.
  • Some commenters argue that “sweat of the brow” processing may not be copyrightable in the U.S., while others simply point to existing PD/CC sources (e.g., Gutenberg, Wikisource).

UX, Bugs, and Feature Requests

  • Reported issues: search box on article pages not working in some browsers (later fixed), escaping bugs (HTML entities), broken tables, glyphs unsupported by the font (℔), Zurich canton/city disambiguation bug, TOC encoding glitches.
  • Requested features:
    • EPUB export and/or bulk download or mirror.
    • Clearer entry points from the home page; logo/title linking to home.
    • Side-by-side text + scan view or thumbnails.
    • Wikipedia-style in-article links and “adjacent article” browsing.

Data Sources, Structure, and Fidelity

  • Creator did not OCR the whole work; started from Wikisource text and built a pipeline to clean, segment, and re-link to page images.
  • Some users note fidelity issues: missing math in at least one article and mis-attached footnotes compared to Wikisource.

Comparisons & Related Projects

  • Thread references Wikisource’s EB1911, Project Gutenberg’s text, other historical dictionaries/encyclopedias, and parallel efforts on earlier/later Britannica editions and other classic reference works.

Historical Value and Problematic Content

  • Many appreciate the distinctive, opinionated prose and pre–World War I worldview.
  • Users highlight both delightful passages (literary enthusiasm, early atomic/fusion speculation, cosmology debates) and disturbing ones (racist claims, sexist medical advice, torture descriptions).
  • Several emphasize the value of old works for understanding past beliefs, including those now seen as immoral or incorrect.

LLMs, Research, and Use Cases

  • Some want the dataset to train models to mimic 1911 Britannica style.
  • Others propose loading structured data into XML/DB tools for large-scale queries.
  • There is debate over using LLMs to summarize and “modernize” dense historical prose vs. reading directly for intellectual exercise.