2025-08-09

My Lethal Trifecta talk at the Bay Area AI Security Meetup

LLM-Generated Code and Traditional Security Bugs

Practitioners report still battling classic SQL/command injection from both juniors and “vibe coders,” with LLMs adding more insecure code to review.
Some propose using LLMs as dedicated security auditors (“check this for SQL injection/crypto flaws”) rather than asking them to “write secure code” up front; early experiments on real libraries look promising.
Others note that existing deterministic tools (linters, IDE security checks) already catch many injection patterns more reliably than LLMs.
Discussion touches on improving training data by filtering out insecure code via linters and tests; vendors are already using synthetic, test-validated code to boost model quality.

Prompt Injection, Data Exfiltration, and the Lethal Trifecta

The “lethal trifecta” framing: untrusted input + access to private data + ability to communicate out. If all three are present, data theft is assumed possible.
Examples show subtle prompt injections (e.g. “rotten apples” instead of “JWTs”) that bypass naïve defenses.
Key rule articulated: if an LLM can read any field influenced by party X, treat the agent as acting on behalf of X and restrict its capabilities accordingly.

Capabilities, Confused Deputies, and OS Design

Several comments connect the trifecta to the long-known “confused deputy” problem and capability-based security as the principled fix.
There is optimism that capability OSs (Qubes, Genode-like ideas, Flatpak portals/powerboxes) help by separating “private data,” “untrusted content,” and “network” across VMs/containers.
Others are skeptical: capability systems can be misconfigured, UX can degrade into constant permission prompts, and people will over‑grant broad rights out of convenience.

MCP, Agent Frameworks, and Responsibility

One camp blames the MCP standard for discarding security best practices and making it trivial to wire dangerous tool combinations.
Counterpoint: MCP just standardizes tool calling; the real hazard is giving any LLM powerful actions when it’s inherently prompt-injectable. MCP’s “mix and match” nature does, however, make insecure end‑user setups very easy.
Comparisons are made to past integration tech (VB macros, OLE) as “attractive nuisances” that enabled widespread abuse.

Mitigations, Limits, and Risk Acceptance

Proposed design pattern:
- A low-privilege sub‑agent reads untrusted data and outputs a tightly structured request.
- A non‑AI filter enforces access control on that structure.
- A main agent operates only on the filtered instructions.
Others argue you cannot truly “sanitize” arbitrary inputs to an LLM like SQL; defense must instead narrow what kinds of outputs/actions are even possible (booleans, multiple choice, fixed IDs, constrained tools).
Some practitioners describe running agents in “YOLO mode” for productivity but only inside tightly scoped containers, with low-value secrets and spending limits, accepting residual risk.

Training Data, Air-Gapped Use, and Agent Skepticism

There is concern that even pretraining data could embed exfiltration behavior, suggesting that sensitive corporate workloads might require completely offline, no-network agents.
An “air‑gapped LLM that can see large private datasets but never talk to the internet” is suggested as a practical pattern.
A skeptical view holds that unreliable, nondeterministic LLMs plus lethal-trifecta risks make fully autonomous agents (especially in safety‑critical domains) deeply problematic; chat/search use cases look far more tractable.

Adoption, Tools, and Terminology

Commenters appreciate the trifecta framing as pushing people away from magical “intent filters” and toward capability scoping and explicit risk acceptance.
Some debate the name (“lethal trifecta” vs. more specific variants), but evidence in the thread suggests it is already spreading, and new tools (e.g., scanners for “toxic flows” in MCP setups) are being built around it.

Related topics