Multi-agent chatbot murder mystery
Game concept & implementation
- Web-based, multi-agent AI murder mystery where players interrogate suspects to deduce the killer, method, and secrets.
- Each suspect has a hidden secret; clues about one character are embedded in others’ context windows, encouraging cross-interrogation.
- System uses a “Critique & Revision” pipeline:
- A “violation bot” checks each suspect response against global and character-specific “Principles” (e.g., no direct confessions).
- A “refinement bot” rewrites responses that violate rules.
- Distinct personality, secret, and violation contexts plus consistent “Detective Sheerluck” roleplay are used to shape behavior.
- Code and full story JSON are open source; players can modify characters or run locally with their own Anthropic API key.
Performance and deployment
- The app repeatedly became non-responsive or extremely slow due to “hug of death” traffic from HN.
- Author upgraded the server and worker count; still, many report long response times and timeouts.
- Recommended workaround: clone the repo, add an API key, and run locally for speed and reliability.
Gameplay experience & difficulty
- Some players enjoyed the concept and narrative, comparing it to other mystery games and jailbreaking challenges.
- Others quickly “broke” the game:
- Forcing suspects or the officer to confess in a handful of prompts.
- Using meta-prompts to reveal full solutions or internal prompts.
- A few note issues like mid-sentence cutoffs and unclear game feedback, making it hard to know whether an action “worked.”
Safety, censorship & jailbreaks
- One user triggered an overzealous safety response to a benign “overview” question, highlighting guardrail brittleness.
- Several discuss trying jailbreak-style prompts as their default “gameplay,” noting this turns all such games into exploit puzzles.
- Debate over safety vs. over-censorship:
- Some want cheaper, less-filtered model APIs to avoid moralizing refusals.
- Others emphasize risks of misinformation, scams, and harmful uses, arguing constraints are necessary.
Quality, polish & future potential
- Mixed views on polish: some praise it as a fun hackathon project; others criticize React defaults and performance as “low effort” or “shovelware.”
- Suggestions include:
- Letting users author their own mysteries via the existing JSON structure.
- Supporting local/open-source models (e.g., small quantized models, browser-based).
- Caching frequent prompts to reduce API calls and latency, possibly using a cheaper model for similarity checks.
- Author mentions logging thousands of interactions for potential fine-tuning; one commenter flags the need for explicit user consent.