2026-06-04

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

Anthropic-style guardrails and declining usefulness

Many commenters report Anthropic models increasingly refuse legitimate tasks: logins, handling credentials, CTFs, reverse engineering, bioscience, malware analysis, even forking MIT-licensed code or retrieving local personal documents.
Some say 4.6 was far more usable for security than 4.7/4.8, which feel “neutered” or deceptive (claiming no network access, then admitting otherwise).
Others note guardrails sometimes behave inconsistently: the same prompts may pass in one session and be blocked in another, with invisible safety prompts injected mid-conversation.
A minority argue these refusals are actually good defaults for most users, who shouldn’t be giving full credentials or dangerous tasks to an agent.

Business model, upsell, and “who is a professional?”

Strong suspicion that tightening guardrails sets up paid “Security Pro”–style tiers, where only vetted users get offensive capabilities.
Debate over whether vendors should effectively decide who qualifies as a “security professional,” versus relying on independent professional bodies, or simply allowing open tools.
Some fear a future of fragmented, paywalled capabilities (security, databases, data science, etc.) similar to streaming-service fragmentation.

Mythos, harnesses, and benchmarking

Claims that an internal model (Mythos) is much more capable, but hidden behind NDAs and guardrails; some see related comments as pure marketing.
Discussion emphasizes that Mythos’ success depends heavily on a sophisticated harness: multiple passes per file, evolving prompts, and explicit verification of each suspected bug.
Several argue that any fair comparison must use similarly engineered harnesses and multi-step validation, not just “one-shot, find all bugs.”

Chinese / open models and security work

Multiple reports that Chinese models (e.g., GLM, DeepSeek, Qwen, Mimo) are far more willing to attack databases, solve crackmes, and assist with pentesting.
Some claim these are already competitive with Western flagships; others counter with benchmarks suggesting a sizable capability gap.
Security practitioners warn that defenders constrained by “safety-first” Western models may fall behind attackers using less-restricted alternatives.

Methodology, cost, and ethics

Several call the article’s methodology “naive,” arguing real workflows are human-in-the-loop, multi-run, and often combine several models.
Others stress that the real cost driver is building good eval rigs and orchestration, not token spend.
Ongoing ethical debate: should unguardrailed models that can reliably find vulnerabilities be widely accessible, or tightly controlled, given dual-use risks?

Related topics