LLM based agents as Dungeon Masters

Use Cases and Appeal

  • Many are excited about LLMs as DMs or co-DMs, especially:
    • Solo play, or for people without a local group or willing human DM.
    • Parents running games for kids, or “perpetual DMs” who rarely get to be players.
    • Exploring unusual settings (e.g., Renaissance Venice, niche homebrew worlds) without lots of prep.
  • LLMs work well for:
    • NPC dialogue, location descriptions, improvisation, and atmospheric text.
    • Generating monsters, encounters, puzzles, scenery art, and music prompts.
    • Rapid campaign building, character/scene “first drafts,” and summary/recap of sessions.

Skepticism and Limitations

  • Many argue the core of tabletop RPGs is human social interaction and tailored storytelling; an AI DM feels “meaningless” or hollow by comparison.
  • Full AI DMs are often:
    • Overly cheerful, conflict-averse, and “sycophantic,” making danger and failure feel fake.
    • Too pliable, letting players succeed at everything or co-author the story with no resistance.
    • Generic, tropey, and quickly boring compared to a strong human DM.
  • Some report campaigns collapsing over continuity issues (e.g., how many maps existed, what items were obtained).

Technical Challenges and Proposed Architectures

  • Biggest problems:
    • Long-term memory: persistent facts about PCs, NPCs, locations, items, and past events.
    • Rules fidelity and combat mechanics; correctly applying or house-ruling systems like D&D.
    • Maintaining consistent tone, personas, pacing, stakes, and constraints.
  • Suggestions:
    • External “world state”/ontology with CRUD for stats, inventory, history, and locations.
    • RAG systems or iterative summarization to stay within context limits.
    • Treat LLM as a component: “system 1” intuition atop traditional “system 2” logic, storage, and dice.
    • Use finetuned open models for darker, higher-stakes play.

Human–AI Hybrid DMing and Research Notes

  • Strong support for LLMs as assistants, not replacements:
    • Co-DM to handle lore, side NPCs, bookkeeping, and “what if” simulations.
    • NPC party companions that interject in chat/voice without replacing the human GM.
  • The thesis is seen as an interesting early study but criticized for:
    • Small control group, opaque methodology, and using an older GPT-3.5-based model.
    • Lack of published gameplay transcripts, making results hard to interpret or reproduce.