Local LLMs versus offline Wikipedia

Combining Local LLMs and Offline Wikipedia

  • Many see this as a clear “why not both”: use a small local LLM as an interface and Wikipedia (e.g., Kiwix/zim, SQLite+FTS, vector DBs) as the factual store.
  • Several mention RAG examples over Wikipedia, local vector indices, and using tiny models (0.6–4B params) that run even on weak hardware or mobile.
  • Proposed workflows: LLM interprets vague questions → returns topic list / file links → user reads the actual articles to avoid hallucinations.

Hardware, Cost, and Access

  • Debate over “just buy a better laptop”: some argue professionals routinely invest thousands in tools; others counter that outside top US salaries, a good laptop can represent a large share of annual income.
  • There’s pushback on the idea that anyone posting on HN can trivially afford new hardware; affordability is framed as relative to local wages and equipment prices.

Offline / Doomsday Scenarios

  • Original “reboot society with a USB stick” line sparks discussion of USBs preloaded with Wikipedia, manuals, and risk literature; some point to existing devices and products.
  • Skeptics mock the idea that civilization collapses yet people still have laptops, solar panels, and time to browse USBs; others note serious preppers already plan for EMP-shielded gear and off-grid power.
  • A government example: internal mirrors of Wikipedia/StackExchange on classified networks show large-scale “offline web” is already practiced.
  • Several emphasize that in real survival situations, practiced skills matter more than giant archives.

LLMs vs Wikipedia: Comprehension, Reliability, Use

  • Pro-LLM side: strength isn’t storage but “comprehension” of vague questions, adapting explanations, language-agnostic access, and synthesizing across domains.
  • Critics: LLMs don’t truly “understand”, they guess; they can confidently give deadly or expensive advice (car repair, medical-like cases, infamous Hitler answer).
  • Many argue Wikipedia (plus sources, talk pages, and cross-language comparison) remains more trustworthy for facts; LLMs work best as search-term translators, tutors, or frontends to real documents.
  • There’s concern that people treat AI as an infallible oracle—likened to sci‑fi episodes where computers become de facto gods.

Compression, Data Scale, and Dumps

  • One commenter estimates all digitized papers/books compress to ~5.5 TB—“three micro SD cards” worth—making massive offline libraries feasible.
  • Specific Wikipedia dump sizes and Kiwix zim files are discussed; LLMs are noted as a kind of learned compression via next-token prediction.

Curation, Encyclopedias, and Benchmarks

  • Some dream of a “Web minus spam/duplicates” super‑encyclopedia; others note curation effort is the hard part and liken it to reinventing Britannica or a library.
  • Talk pages and revision history are highlighted as crucial context, especially for controversial topics.
  • A few lament that LLM usefulness is mostly judged by anecdotes; they’d like more rigorous LLM‑vs‑traditional search benchmarks.