2025-03-16

Big LLMs weights are a piece of history

LLMs as Memory vs Intelligence

Several commenters frame LLMs as “artificial memory” or a “JPEG for language”: lossy but compact, queryable representations of the web and text corpora.
Main practical value seen in search, research, and “rubber duck” use, i.e., fast recall and synthesis rather than deep intelligence.
Debate over “intelligence”:
- One side stresses that intelligence is task-relative and should be defined operationally (e.g., goal achievement).
- Others argue LLMs feel powerful but still lack something crucial humans have (often labeled “agency”).
- Disagreement over whether trying to nail down a precise definition is productive or scholastic.

Historical Value & Lossy Compression

Core idea: big LLM weights are historical artifacts, akin to a time-compressed snapshot of internet knowledge.
Some are skeptical: LLM output is unattributed, often inaccurate “hearsay,” so limited as a primary source.
Counterpoint: even as lossy compression, models could help future historians discover directions and themes to investigate with surviving sources.
Analogy to pre-WW2 “clean” steel: pre-LLM models or datasets might later be prized as unpolluted by AI-generated content.

Training Data, Weights, and Archiving

Several argue the training data is closer to an Internet Archive; the weights are a distilled, interactive map of it.
Concern that much training data (web pages, scientific papers) will vanish due to dead sites, paywalls, or publisher decisions.
Worry that closed providers’ 10–20T token datasets aren’t publicly archived; hope they at least preserve them internally.
Mozilla’s “llamafile” is highlighted as a way to freeze models (weights + deterministic runtime) for decades.
Some note LLMs might be easier to port than legacy CUDA software, since they’re just “bags of numbers” plus math—but others stress you still need precise metadata and implementation details.

Internet Archive & Preservation Debates

Strong praise for the Internet Archive as a “sacred” institution preserving web history, with parallel efforts in Europe and Canada.
Also harsh criticism: poor physical siting (near a refinery, seismic risk), technical flaws (search, broken captures, corrupted data), and controversial legal/operational choices.
Discussion about funding, governance, and whether they should narrow scope to long-term, clearly legal archiving.

Small vs Big Models, Use Expectations

Jokes and semi-serious proposals for size nomenclature: tiny/smol/mid/biggg/yuuge; clothing sizes; Starbucks-style; XXLLM→XSLM; etc.
Observations that on-device “tiny” models (e.g., smartphone assistants) currently perform noticeably worse; users show deterministic expectations of inherently stochastic tools.
Some stress LLMs should be treated as “advanced power tools,” not autonomous agents; failures like ambiguous, risky summaries (“Drunk and crashed”) are about poor application design, not mere non-determinism.

Ethical and Cultural Concerns

Fear that LLM-driven interfaces will replace direct web visits, letting corporations monetize scraped content and manipulate what users see.
Speculation about fully algorithmic comment sections tuned to sell products or push propaganda, including fake consoling replies and political manipulation.
Some find it depressing that “AI slop” could become the main surviving trace of today’s online creativity.
Others are comfortable with selective loss: not everything should or can be archived; curation is inevitable.

Technical and Research Ideas

Mentions of brain-inspired “memory architectures” and content-addressable schemes to separate or enhance memory in ML systems.
Curiosity about whether overlapping LLMs could be used to reconstruct approximate training corpora.
Suggestions to mine the Internet Archive to train specialized models (e.g., for 6502 machine code and vintage games).

Meta & Misc

Several users emphasize preference for original sources and curated repositories (encyclopedias, PDFs) over generated summaries.
Some enjoy the idea of future historians “interviewing” 2025-era models about our culture and “vibes,” hallucinations and all.

Related topics