LLM based agents as Dungeon Masters
Use Cases and Appeal
- Many are excited about LLMs as DMs or co-DMs, especially:
- Solo play, or for people without a local group or willing human DM.
- Parents running games for kids, or “perpetual DMs” who rarely get to be players.
- Exploring unusual settings (e.g., Renaissance Venice, niche homebrew worlds) without lots of prep.
- LLMs work well for:
- NPC dialogue, location descriptions, improvisation, and atmospheric text.
- Generating monsters, encounters, puzzles, scenery art, and music prompts.
- Rapid campaign building, character/scene “first drafts,” and summary/recap of sessions.
Skepticism and Limitations
- Many argue the core of tabletop RPGs is human social interaction and tailored storytelling; an AI DM feels “meaningless” or hollow by comparison.
- Full AI DMs are often:
- Overly cheerful, conflict-averse, and “sycophantic,” making danger and failure feel fake.
- Too pliable, letting players succeed at everything or co-author the story with no resistance.
- Generic, tropey, and quickly boring compared to a strong human DM.
- Some report campaigns collapsing over continuity issues (e.g., how many maps existed, what items were obtained).
Technical Challenges and Proposed Architectures
- Biggest problems:
- Long-term memory: persistent facts about PCs, NPCs, locations, items, and past events.
- Rules fidelity and combat mechanics; correctly applying or house-ruling systems like D&D.
- Maintaining consistent tone, personas, pacing, stakes, and constraints.
- Suggestions:
- External “world state”/ontology with CRUD for stats, inventory, history, and locations.
- RAG systems or iterative summarization to stay within context limits.
- Treat LLM as a component: “system 1” intuition atop traditional “system 2” logic, storage, and dice.
- Use finetuned open models for darker, higher-stakes play.
Human–AI Hybrid DMing and Research Notes
- Strong support for LLMs as assistants, not replacements:
- Co-DM to handle lore, side NPCs, bookkeeping, and “what if” simulations.
- NPC party companions that interject in chat/voice without replacing the human GM.
- The thesis is seen as an interesting early study but criticized for:
- Small control group, opaque methodology, and using an older GPT-3.5-based model.
- Lack of published gameplay transcripts, making results hard to interpret or reproduce.