2024-07-09

Multi-agent chatbot murder mystery

Game concept & implementation

Web-based, multi-agent AI murder mystery where players interrogate suspects to deduce the killer, method, and secrets.
Each suspect has a hidden secret; clues about one character are embedded in others’ context windows, encouraging cross-interrogation.
System uses a “Critique & Revision” pipeline:
- A “violation bot” checks each suspect response against global and character-specific “Principles” (e.g., no direct confessions).
- A “refinement bot” rewrites responses that violate rules.
Distinct personality, secret, and violation contexts plus consistent “Detective Sheerluck” roleplay are used to shape behavior.
Code and full story JSON are open source; players can modify characters or run locally with their own Anthropic API key.

Performance and deployment

The app repeatedly became non-responsive or extremely slow due to “hug of death” traffic from HN.
Author upgraded the server and worker count; still, many report long response times and timeouts.
Recommended workaround: clone the repo, add an API key, and run locally for speed and reliability.

Gameplay experience & difficulty

Some players enjoyed the concept and narrative, comparing it to other mystery games and jailbreaking challenges.
Others quickly “broke” the game:
- Forcing suspects or the officer to confess in a handful of prompts.
- Using meta-prompts to reveal full solutions or internal prompts.
A few note issues like mid-sentence cutoffs and unclear game feedback, making it hard to know whether an action “worked.”

Safety, censorship & jailbreaks

One user triggered an overzealous safety response to a benign “overview” question, highlighting guardrail brittleness.
Several discuss trying jailbreak-style prompts as their default “gameplay,” noting this turns all such games into exploit puzzles.
Debate over safety vs. over-censorship:
- Some want cheaper, less-filtered model APIs to avoid moralizing refusals.
- Others emphasize risks of misinformation, scams, and harmful uses, arguing constraints are necessary.

Quality, polish & future potential

Mixed views on polish: some praise it as a fun hackathon project; others criticize React defaults and performance as “low effort” or “shovelware.”
Suggestions include:
- Letting users author their own mysteries via the existing JSON structure.
- Supporting local/open-source models (e.g., small quantized models, browser-based).
- Caching frequent prompts to reduce API calls and latency, possibly using a cheaper model for similarity checks.
Author mentions logging thousands of interactions for potential fine-tuning; one commenter flags the need for explicit user consent.

Related topics