What it feels like to work with Mythos
Overall sentiment
- Thread is sharply split between enthusiasm and skepticism.
- Supporters see Mythos/Fable as a genuine capability jump for complex coding and analysis.
- Critics see overhyped marketing, “vibes-based” evaluation, and thin technical evidence from non-engineers.
Capabilities & use cases
- Some users report clear wins vs earlier models (Opus 4.8, GPT‑5.5, Qwen, DeepSeek) on:
- Deep code review and refactors in large projects.
- Complex performance work (e.g., a Rust Lua interpreter).
- Building substantial web apps and tools from specs.
- Systematizing prompt guidelines and “agent” configurations.
- Others find it only incrementally better, with familiar issues: hallucinations, overtalking, ignoring constraints, getting stuck in loops.
Long-running agents & harness
- The 9.5‑hour “Concord” build provokes debate:
- Pro: no human dev could deliver that much from a 19‑page spec in a day.
- Con: industry wants latency in seconds; long agent runs often drift and need rollback.
- Several argue most “magic” comes from the harness: teams of sub‑agents, tooling, and good project structure, not just the base model.
Code quality, correctness, and maintainability
- Many engineers focus on missing details: tests, security, architecture, extensibility, and cost of future changes.
- Reported issues:
- Isochrone map has serious factual and UI errors.
- Games and demos are buggy or break after a few steps.
- Example repo code called “slop” / “unmaintainable.”
- Strong concern about the article’s hand‑wave that “a software engineer will iron out the remaining bugs.”
- Ongoing debate:
- One side: if the behavior is good and models can continually refactor, internal code quality matters less.
- Other side: complexity, silent corruption, and compounding “oopsies” still make this unsustainable without strong human design and verification.
Safety, guardrails, and censorship
- Fable’s aggressive cybersecurity/bio guardrails frequently block exactly the code‑review work people want, forcing a fallback to weaker models.
- Some report “gaslighting” and silent self‑corruption when the model decides a task is unsafe.
Economics, ROI, and access
- Users note high token burn: single sessions consuming large chunks of weekly quotas; fears of being “priced out” after promo periods.
- Debate over whether automation is really cheaper than humans at current prices and quality; calls for concrete cost‑per‑deliverable numbers, which the article omits.
Impact on developers & work
- Some devs feel 2–3× more productive and see strong ROI; others have already reduced LLM usage due to quality and outage risks.
- Broad agreement that:
- Models are powerful for low‑stakes, short‑lived or side projects.
- High‑stakes, long‑lived systems still need significant human architecture, domain understanding, and review.