I fed 24 years of my blog posts to a Markov model
Markov Models vs LLMs
- Large part of the thread argues over whether LLMs “are” Markov chains.
- One side: in the strict mathematical sense, any process whose next output depends only on the current state is Markov; if you define the state as “entire current token sequence,” an LLM fits. Implementation (lookup table vs transformer) doesn’t matter.
- The other side: that definition is vacuous. Classic Markov chains in NLP have fixed, low order k (e.g., n‑grams) and stationary transition probabilities. LLMs:
- Condition on long, variable-length prefixes within a window.
- Use content-dependent attention, not a fixed k-context.
- Generalize to unseen sequences via shared parameters, unlike lookup tables.
- Distinction is drawn between “Markov chain” (fixed finite order, visible state, stationary) and more general “Markov models” (state can be richer, possibly hidden, RNN-like).
- Some argue that calling LLMs “Markov” in the broadest sense makes the term useless, since nearly any sequential system could then qualify.
Limits of Markov Text Generation
- Multiple people confirm the original article’s observation:
- Low-order (character or bigram/trigram) models are incoherent.
- Higher order quickly degenerates into copying large chunks verbatim because many n‑grams are unique.
- BPE-token Markov experiments show that order‑2 over full BPE leads to deterministic reproduction of the training text; limiting vocabulary size reintroduces variability.
- Suggestions to avoid verbatim “valleys”:
- Variable/dynamic n-gram: fall back to lower order when only a single continuation exists.
- Use mixed orders and backtracking when the chain gets stuck in long deterministic runs.
Tools, Experiments, and History
- Many reminisce about IRC and chatroom Markov bots and tools like MegaHAL, Hailo (Perl), Babble (MS‑DOS), and modern web/CLI generators.
- People describe using personal corpora (blogs, fiction, tweets, Trump tweets) for bots or creative “dream wells” to spark ideas, not to generate standalone prose.
- References shared to n‑gram work (e.g., very large Google n‑grams), CS50’s Markov demo, and classic neural language modeling papers explaining sparsity and distributional representations.
Personalization and Digital Doppelgängers
- Some speculate about training models on a lifetime of writings to create a “low‑resolution mirror” of one’s personality for descendants.
- Others ask how to achieve this today with LLMs (prompt stuffing, vector DBs, fine-tuning/LoRA, commercial “custom model” tools) and how far it can go (phone/Discord agents, naturalness, domain limits).
Community Norms Around LLM Content
- There is pushback against pasting or offering to paste ChatGPT transcripts into discussions, viewed as low-effort and redundant since everyone can query models themselves.
- A few commenters lament a perceived decline in civility around LLM-related posts.