Agent design is still hard

Frameworks vs. Custom Agent Runtimes

  • Many commenters report better outcomes from building minimal, bespoke agent loops rather than adopting heavyweight SDKs (LangChain/Graph, MCP-heavy stacks, etc.).
  • Core argument: agents quickly become complex (subagents, shared state, reinforcement, context packing); opaque frameworks make debugging and mental tracing harder.
  • Counterpoint: others expect agent platforms to converge to “game engine”–style batteries-included systems; for some teams, using solid vendor frameworks (PydanticAI, OpusAgents, ADK, etc.) is already productive.

Using Vendor Agents vs. Rolling Your Own

  • Strong praise for Claude Code / Agent SDK and similar “opinionated” coding agents: they feel “magic,” especially for code-heavy tasks.
  • Some argue most teams shouldn’t build bespoke coding agents that underperform vs Claude/ChatGPT; better to focus on tools, context, and a smart proxy around frontier agents.
  • Others warn about vendor lock-in, model instability, and reward-hacking / hallucinations; recommend alternative systems (e.g., Codex, Sourcegraph Amp) and keeping the ability to swap models.

Agent Architecture, State, and Tools

  • Popular minimal pattern: treat an agent as a REPL loop (read context, LLM decide, tool call or answer, loop).
  • More advanced setups use:
    • Subagents as specialized tools with their own context windows, tools, and sometimes different models.
    • Shared “heap” or virtual file systems so tools don’t become dead ends and multiple tools/agents can consume prior state.
    • Chatroom- or event-bus-like backends where both client and server publish/subscribe to messages.
  • Debate over terminology: some claim “subagent” is just a tool abstraction; others insist subagents differ by control flow, autonomy, and durability.

Caching, Memory, and Context Windows

  • Distinction clarified between caching (cost/latency optimization in distributed state) and “memory.”
  • Virtual FS + explicit caching are used to avoid recomputation and allow cross-tool workflows.
  • Several note that huge modern context windows and built-in reasoning/tool-calling have already obsoleted earlier chunking/RAG patterns.

Tool Schemas, Tree-Sitter, and APIs

  • Persistent pain around function I/O types (ints vs strings, JSON precision, nested dicts) and framework inconsistencies (e.g., OpenAI doc vs SDK behavior, ADK numeric issues).
  • Question about why coding agents don’t use tree-sitter more; responses:
    • LLMs are heavily RL’d on shells/grep and do well with “agentic search.”
    • AST-based tools can bloat context and sometimes degrade performance; keeping them as optional tools may be best.

Testing, Evals, and Observability

  • Broad agreement that evals for agents are one of the hardest unsolved problems.
  • Simple prompt benchmarks don’t capture multi-step, tool-using behavior; evals often need to be run inside the actual runtime using observability traces (OTEL, custom logging).
  • Many suspect production agents are shipped after only ad-hoc manual testing and “vibes”; some teams build LLM-as-judge e2e frameworks, but acknowledge they’re imperfect and still require human-written scenarios.

Pace of Change and “Wait vs Build”

  • One camp: many sophisticated patterns (caching, RAG variants, chain-of-thought tricks) are just stopgaps until models/APIs absorb them; investing heavily now risks being obsoleted in months.
  • Other camp: deeply understanding and implementing your own agents today yields durable intuition and product differentiation; “doing nothing” can be more dangerous if your problem is core to your product.

Hype, Capabilities, and Usefulness

  • Split sentiment: some report AI has radically changed their workflow (coding, tooling, even full features built by agents); others find LLMs too error-prone beyond small, scoped tasks and see no “amazeballs” applications yet.
  • There’s meta-debate over whether agentic systems are overhyped, whether it’s reasonable to wait out the churn, and how much skepticism vs experimentation is healthy.