Building a Personal AI Factory

Clarity of the “AI factory” workflow

  • Several readers say the article is too high-level: they can’t tell what concrete outputs this setup is producing, or how models “talk to each other” in practice.
  • People ask for example sessions, prompts, and code, not just architecture diagrams and claims. Without that, they find it hard to evaluate whether this is more than “dream workflow” marketing.

How developers are actually using LLMs

  • Many self-described heavy users mostly rely on LLMs for:
    • Planning, design discussions, and “rubber-duck” reasoning
    • Small features, boilerplate, tests, config, and unfamiliar stacks
  • For complex or production systems, they stay tightly in the loop: reading every diff, adjusting design, and using AI as a speedup rather than an autonomous builder.
  • Some find they now write less code with AI than a year ago because they value architecture, consistency, and maintainability over raw output volume.

Multi-agent setups: promise and fragility

  • Multi-agent + MCP workflows (Goose, Zen MCP, OpenRouter, repomix, etc.) excite some: they report substantial speedups, cross-model “second opinions,” and parallel worktrees.
  • Others find them extremely brittle: JSON formatting breaks chains, tools aren’t invoked reliably, and small changes in prompts or models can flip a “humming” system into chaos.
  • A recurring problem: different agents invent incompatible schemas, APIs, and UI patterns, forcing huge instruction files to enforce consistency.

Code quality, correctness, and responsibility

  • Strong skepticism toward “correct by construction” claims for stochastic systems. Critics see this as “rolling the dice” and worry about discarding working code just to re-generate it.
  • Multiple commenters report that unsupervised agents produce bugs that users hit; they now treat AI output as fully their responsibility, especially in finance or security-sensitive domains.
  • Consensus: LLMs shine on well-defined, easily verified tasks; they struggle with “hard code,” complex legacy systems, and subtle architecture issues.

Cost and scale

  • Claude Max at $200/month is seen as a good deal for heavy use; Pro hits limits quickly in multi-agent scenarios.
  • Tools like ccusage reveal that users are likely being heavily subsidized compared to API pricing, raising doubts about long-term economics.

Vibe coding, hype, and trajectory

  • Some feel “vibe coding” disillusionment growing: expensive, draining, messy results, too much “arguing with your computer.”
  • Others report the opposite: a wave of converts who can now spin up trivial or moderate apps in an hour, regarding this as a qualitative shift from “fancy autocomplete.”
  • Emerging middle ground: AI factories are powerful for greenfield, low-stakes, or repetitive work; true robustness still depends on human design, review, and rigorous tests.