I fed 24 years of my blog posts to a Markov model

Markov Models vs LLMs

  • Large part of the thread argues over whether LLMs “are” Markov chains.
  • One side: in the strict mathematical sense, any process whose next output depends only on the current state is Markov; if you define the state as “entire current token sequence,” an LLM fits. Implementation (lookup table vs transformer) doesn’t matter.
  • The other side: that definition is vacuous. Classic Markov chains in NLP have fixed, low order k (e.g., n‑grams) and stationary transition probabilities. LLMs:
    • Condition on long, variable-length prefixes within a window.
    • Use content-dependent attention, not a fixed k-context.
    • Generalize to unseen sequences via shared parameters, unlike lookup tables.
  • Distinction is drawn between “Markov chain” (fixed finite order, visible state, stationary) and more general “Markov models” (state can be richer, possibly hidden, RNN-like).
  • Some argue that calling LLMs “Markov” in the broadest sense makes the term useless, since nearly any sequential system could then qualify.

Limits of Markov Text Generation

  • Multiple people confirm the original article’s observation:
    • Low-order (character or bigram/trigram) models are incoherent.
    • Higher order quickly degenerates into copying large chunks verbatim because many n‑grams are unique.
  • BPE-token Markov experiments show that order‑2 over full BPE leads to deterministic reproduction of the training text; limiting vocabulary size reintroduces variability.
  • Suggestions to avoid verbatim “valleys”:
    • Variable/dynamic n-gram: fall back to lower order when only a single continuation exists.
    • Use mixed orders and backtracking when the chain gets stuck in long deterministic runs.

Tools, Experiments, and History

  • Many reminisce about IRC and chatroom Markov bots and tools like MegaHAL, Hailo (Perl), Babble (MS‑DOS), and modern web/CLI generators.
  • People describe using personal corpora (blogs, fiction, tweets, Trump tweets) for bots or creative “dream wells” to spark ideas, not to generate standalone prose.
  • References shared to n‑gram work (e.g., very large Google n‑grams), CS50’s Markov demo, and classic neural language modeling papers explaining sparsity and distributional representations.

Personalization and Digital Doppelgängers

  • Some speculate about training models on a lifetime of writings to create a “low‑resolution mirror” of one’s personality for descendants.
  • Others ask how to achieve this today with LLMs (prompt stuffing, vector DBs, fine-tuning/LoRA, commercial “custom model” tools) and how far it can go (phone/Discord agents, naturalness, domain limits).

Community Norms Around LLM Content

  • There is pushback against pasting or offering to paste ChatGPT transcripts into discussions, viewed as low-effort and redundant since everyone can query models themselves.
  • A few commenters lament a perceived decline in civility around LLM-related posts.