AI agents: Less capability, more reliability, please

Human-in-the-loop and workflows vs “general agents”

  • Strong support for narrow, deterministic workflows with clear steps and human checkpoints, rather than open-ended autonomous agents.
  • Many argue the right pattern is: define explicit processes, insert LLM calls as tools inside them, and focus on reversibility (git, undo, containers, CRDTs) instead of pretending errors won’t happen.
  • Some see “agents” as emergent from stitched workflows, not a fundamentally separate thing.

Agent UX vs existing tools

  • Repeated complaint: “book a flight” or “order groceries” agents are worse than mature UIs like Google Flights, Uber, or a basic web search.
  • Natural-language interfaces are unbounded and hard to test; they also obscure what the system can and can’t do, making trust harder.
  • Comparisons to 2016 chatbots and touchscreens-in-cars: shiny, but often a UX regression.

Coding assistants and “vibe coding”

  • Many developers only accept AI-written code they can understand, or at least cover with solid tests.
  • Others are comfortable with “vibe coding” for disposable/one-off analysis and scripts, treating LLMs like a black-box library, as long as behavior passes tests.
  • The Cursor/git wipeout anecdote splits opinion: some blame users’ lack of version control basics; others say tools must embed guardrails for catastrophic operations.

Flight booking as a case study

  • Users describe highly personal heuristics: price vs layover length, airport quality, loyalty points, weird award tricks, “hidden city” routes.
  • Many doubt a generic agent can capture these shifting, tacit tradeoffs; they’d accept suggestions, but not blind booking.
  • A few argue that exactly these complex, multi-step optimizations are where agents could shine—if they ever become reliably better than doing it yourself.

Reliability, evaluation, and error handling

  • Several builders emphasize rigorous, task-specific evals instead of benchmark chasing; “real-world” tests often differ drastically from academic ones.
  • Ideas: retry loops modeled after TCP, specialized sub-agents for “is this irreversible?”, and designing systems around detection and rollback rather than perfection.
  • Skeptics note hallucinations and probabilistic behavior seem inherent; for high-stakes domains (medicine, finance, booking travel) partial reliability is not acceptable.

Incentives, hype, and platform dynamics

  • Widespread criticism of overpromised “AI engineer” / “MCP for everything” products and capability-first demos that barely work in practice.
  • Concern that agent platforms will become new rent-seeking intermediaries, similar to ad-powered search, app stores, or private APIs—especially if businesses differentiate between human and AI “users.”