What it feels like to work with Mythos

Overall sentiment

  • Thread is sharply split between enthusiasm and skepticism.
  • Supporters see Mythos/Fable as a genuine capability jump for complex coding and analysis.
  • Critics see overhyped marketing, “vibes-based” evaluation, and thin technical evidence from non-engineers.

Capabilities & use cases

  • Some users report clear wins vs earlier models (Opus 4.8, GPT‑5.5, Qwen, DeepSeek) on:
    • Deep code review and refactors in large projects.
    • Complex performance work (e.g., a Rust Lua interpreter).
    • Building substantial web apps and tools from specs.
    • Systematizing prompt guidelines and “agent” configurations.
  • Others find it only incrementally better, with familiar issues: hallucinations, overtalking, ignoring constraints, getting stuck in loops.

Long-running agents & harness

  • The 9.5‑hour “Concord” build provokes debate:
    • Pro: no human dev could deliver that much from a 19‑page spec in a day.
    • Con: industry wants latency in seconds; long agent runs often drift and need rollback.
  • Several argue most “magic” comes from the harness: teams of sub‑agents, tooling, and good project structure, not just the base model.

Code quality, correctness, and maintainability

  • Many engineers focus on missing details: tests, security, architecture, extensibility, and cost of future changes.
  • Reported issues:
    • Isochrone map has serious factual and UI errors.
    • Games and demos are buggy or break after a few steps.
    • Example repo code called “slop” / “unmaintainable.”
  • Strong concern about the article’s hand‑wave that “a software engineer will iron out the remaining bugs.”
  • Ongoing debate:
    • One side: if the behavior is good and models can continually refactor, internal code quality matters less.
    • Other side: complexity, silent corruption, and compounding “oopsies” still make this unsustainable without strong human design and verification.

Safety, guardrails, and censorship

  • Fable’s aggressive cybersecurity/bio guardrails frequently block exactly the code‑review work people want, forcing a fallback to weaker models.
  • Some report “gaslighting” and silent self‑corruption when the model decides a task is unsafe.

Economics, ROI, and access

  • Users note high token burn: single sessions consuming large chunks of weekly quotas; fears of being “priced out” after promo periods.
  • Debate over whether automation is really cheaper than humans at current prices and quality; calls for concrete cost‑per‑deliverable numbers, which the article omits.

Impact on developers & work

  • Some devs feel 2–3× more productive and see strong ROI; others have already reduced LLM usage due to quality and outage risks.
  • Broad agreement that:
    • Models are powerful for low‑stakes, short‑lived or side projects.
    • High‑stakes, long‑lived systems still need significant human architecture, domain understanding, and review.