2025-12-13

I fed 24 years of my blog posts to a Markov model

Markov Models vs LLMs

Large part of the thread argues over whether LLMs “are” Markov chains.
One side: in the strict mathematical sense, any process whose next output depends only on the current state is Markov; if you define the state as “entire current token sequence,” an LLM fits. Implementation (lookup table vs transformer) doesn’t matter.
The other side: that definition is vacuous. Classic Markov chains in NLP have fixed, low order k (e.g., n‑grams) and stationary transition probabilities. LLMs:
- Condition on long, variable-length prefixes within a window.
- Use content-dependent attention, not a fixed k-context.
- Generalize to unseen sequences via shared parameters, unlike lookup tables.
Distinction is drawn between “Markov chain” (fixed finite order, visible state, stationary) and more general “Markov models” (state can be richer, possibly hidden, RNN-like).
Some argue that calling LLMs “Markov” in the broadest sense makes the term useless, since nearly any sequential system could then qualify.

Limits of Markov Text Generation

Multiple people confirm the original article’s observation:
- Low-order (character or bigram/trigram) models are incoherent.
- Higher order quickly degenerates into copying large chunks verbatim because many n‑grams are unique.
BPE-token Markov experiments show that order‑2 over full BPE leads to deterministic reproduction of the training text; limiting vocabulary size reintroduces variability.
Suggestions to avoid verbatim “valleys”:
- Variable/dynamic n-gram: fall back to lower order when only a single continuation exists.
- Use mixed orders and backtracking when the chain gets stuck in long deterministic runs.

Tools, Experiments, and History

Many reminisce about IRC and chatroom Markov bots and tools like MegaHAL, Hailo (Perl), Babble (MS‑DOS), and modern web/CLI generators.
People describe using personal corpora (blogs, fiction, tweets, Trump tweets) for bots or creative “dream wells” to spark ideas, not to generate standalone prose.
References shared to n‑gram work (e.g., very large Google n‑grams), CS50’s Markov demo, and classic neural language modeling papers explaining sparsity and distributional representations.

Personalization and Digital Doppelgängers

Some speculate about training models on a lifetime of writings to create a “low‑resolution mirror” of one’s personality for descendants.
Others ask how to achieve this today with LLMs (prompt stuffing, vector DBs, fine-tuning/LoRA, commercial “custom model” tools) and how far it can go (phone/Discord agents, naturalness, domain limits).

Community Norms Around LLM Content

There is pushback against pasting or offering to paste ChatGPT transcripts into discussions, viewed as low-effort and redundant since everyone can query models themselves.
A few commenters lament a perceived decline in civility around LLM-related posts.

Related topics