Local LLMs versus offline Wikipedia
Combining Local LLMs and Offline Wikipedia
- Many see this as a clear “why not both”: use a small local LLM as an interface and Wikipedia (e.g., Kiwix/zim, SQLite+FTS, vector DBs) as the factual store.
- Several mention RAG examples over Wikipedia, local vector indices, and using tiny models (0.6–4B params) that run even on weak hardware or mobile.
- Proposed workflows: LLM interprets vague questions → returns topic list / file links → user reads the actual articles to avoid hallucinations.
Hardware, Cost, and Access
- Debate over “just buy a better laptop”: some argue professionals routinely invest thousands in tools; others counter that outside top US salaries, a good laptop can represent a large share of annual income.
- There’s pushback on the idea that anyone posting on HN can trivially afford new hardware; affordability is framed as relative to local wages and equipment prices.
Offline / Doomsday Scenarios
- Original “reboot society with a USB stick” line sparks discussion of USBs preloaded with Wikipedia, manuals, and risk literature; some point to existing devices and products.
- Skeptics mock the idea that civilization collapses yet people still have laptops, solar panels, and time to browse USBs; others note serious preppers already plan for EMP-shielded gear and off-grid power.
- A government example: internal mirrors of Wikipedia/StackExchange on classified networks show large-scale “offline web” is already practiced.
- Several emphasize that in real survival situations, practiced skills matter more than giant archives.
LLMs vs Wikipedia: Comprehension, Reliability, Use
- Pro-LLM side: strength isn’t storage but “comprehension” of vague questions, adapting explanations, language-agnostic access, and synthesizing across domains.
- Critics: LLMs don’t truly “understand”, they guess; they can confidently give deadly or expensive advice (car repair, medical-like cases, infamous Hitler answer).
- Many argue Wikipedia (plus sources, talk pages, and cross-language comparison) remains more trustworthy for facts; LLMs work best as search-term translators, tutors, or frontends to real documents.
- There’s concern that people treat AI as an infallible oracle—likened to sci‑fi episodes where computers become de facto gods.
Compression, Data Scale, and Dumps
- One commenter estimates all digitized papers/books compress to ~5.5 TB—“three micro SD cards” worth—making massive offline libraries feasible.
- Specific Wikipedia dump sizes and Kiwix zim files are discussed; LLMs are noted as a kind of learned compression via next-token prediction.
Curation, Encyclopedias, and Benchmarks
- Some dream of a “Web minus spam/duplicates” super‑encyclopedia; others note curation effort is the hard part and liken it to reinventing Britannica or a library.
- Talk pages and revision history are highlighted as crucial context, especially for controversial topics.
- A few lament that LLM usefulness is mostly judged by anecdotes; they’d like more rigorous LLM‑vs‑traditional search benchmarks.