2026-06-09

What it feels like to work with Mythos

Overall sentiment

Thread is sharply split between enthusiasm and skepticism.
Supporters see Mythos/Fable as a genuine capability jump for complex coding and analysis.
Critics see overhyped marketing, “vibes-based” evaluation, and thin technical evidence from non-engineers.

Capabilities & use cases

Some users report clear wins vs earlier models (Opus 4.8, GPT‑5.5, Qwen, DeepSeek) on:
- Deep code review and refactors in large projects.
- Complex performance work (e.g., a Rust Lua interpreter).
- Building substantial web apps and tools from specs.
- Systematizing prompt guidelines and “agent” configurations.
Others find it only incrementally better, with familiar issues: hallucinations, overtalking, ignoring constraints, getting stuck in loops.

Long-running agents & harness

The 9.5‑hour “Concord” build provokes debate:
- Pro: no human dev could deliver that much from a 19‑page spec in a day.
- Con: industry wants latency in seconds; long agent runs often drift and need rollback.
Several argue most “magic” comes from the harness: teams of sub‑agents, tooling, and good project structure, not just the base model.

Code quality, correctness, and maintainability

Many engineers focus on missing details: tests, security, architecture, extensibility, and cost of future changes.
Reported issues:
- Isochrone map has serious factual and UI errors.
- Games and demos are buggy or break after a few steps.
- Example repo code called “slop” / “unmaintainable.”
Strong concern about the article’s hand‑wave that “a software engineer will iron out the remaining bugs.”
Ongoing debate:
- One side: if the behavior is good and models can continually refactor, internal code quality matters less.
- Other side: complexity, silent corruption, and compounding “oopsies” still make this unsustainable without strong human design and verification.

Safety, guardrails, and censorship

Fable’s aggressive cybersecurity/bio guardrails frequently block exactly the code‑review work people want, forcing a fallback to weaker models.
Some report “gaslighting” and silent self‑corruption when the model decides a task is unsafe.

Economics, ROI, and access

Users note high token burn: single sessions consuming large chunks of weekly quotas; fears of being “priced out” after promo periods.
Debate over whether automation is really cheaper than humans at current prices and quality; calls for concrete cost‑per‑deliverable numbers, which the article omits.

Impact on developers & work

Some devs feel 2–3× more productive and see strong ROI; others have already reduced LLM usage due to quality and outage risks.
Broad agreement that:
- Models are powerful for low‑stakes, short‑lived or side projects.
- High‑stakes, long‑lived systems still need significant human architecture, domain understanding, and review.

Related topics