Agent design is still hard
Frameworks vs. Custom Agent Runtimes
- Many commenters report better outcomes from building minimal, bespoke agent loops rather than adopting heavyweight SDKs (LangChain/Graph, MCP-heavy stacks, etc.).
- Core argument: agents quickly become complex (subagents, shared state, reinforcement, context packing); opaque frameworks make debugging and mental tracing harder.
- Counterpoint: others expect agent platforms to converge to “game engine”–style batteries-included systems; for some teams, using solid vendor frameworks (PydanticAI, OpusAgents, ADK, etc.) is already productive.
Using Vendor Agents vs. Rolling Your Own
- Strong praise for Claude Code / Agent SDK and similar “opinionated” coding agents: they feel “magic,” especially for code-heavy tasks.
- Some argue most teams shouldn’t build bespoke coding agents that underperform vs Claude/ChatGPT; better to focus on tools, context, and a smart proxy around frontier agents.
- Others warn about vendor lock-in, model instability, and reward-hacking / hallucinations; recommend alternative systems (e.g., Codex, Sourcegraph Amp) and keeping the ability to swap models.
Agent Architecture, State, and Tools
- Popular minimal pattern: treat an agent as a REPL loop (read context, LLM decide, tool call or answer, loop).
- More advanced setups use:
- Subagents as specialized tools with their own context windows, tools, and sometimes different models.
- Shared “heap” or virtual file systems so tools don’t become dead ends and multiple tools/agents can consume prior state.
- Chatroom- or event-bus-like backends where both client and server publish/subscribe to messages.
- Debate over terminology: some claim “subagent” is just a tool abstraction; others insist subagents differ by control flow, autonomy, and durability.
Caching, Memory, and Context Windows
- Distinction clarified between caching (cost/latency optimization in distributed state) and “memory.”
- Virtual FS + explicit caching are used to avoid recomputation and allow cross-tool workflows.
- Several note that huge modern context windows and built-in reasoning/tool-calling have already obsoleted earlier chunking/RAG patterns.
Tool Schemas, Tree-Sitter, and APIs
- Persistent pain around function I/O types (ints vs strings, JSON precision, nested dicts) and framework inconsistencies (e.g., OpenAI doc vs SDK behavior, ADK numeric issues).
- Question about why coding agents don’t use tree-sitter more; responses:
- LLMs are heavily RL’d on shells/grep and do well with “agentic search.”
- AST-based tools can bloat context and sometimes degrade performance; keeping them as optional tools may be best.
Testing, Evals, and Observability
- Broad agreement that evals for agents are one of the hardest unsolved problems.
- Simple prompt benchmarks don’t capture multi-step, tool-using behavior; evals often need to be run inside the actual runtime using observability traces (OTEL, custom logging).
- Many suspect production agents are shipped after only ad-hoc manual testing and “vibes”; some teams build LLM-as-judge e2e frameworks, but acknowledge they’re imperfect and still require human-written scenarios.
Pace of Change and “Wait vs Build”
- One camp: many sophisticated patterns (caching, RAG variants, chain-of-thought tricks) are just stopgaps until models/APIs absorb them; investing heavily now risks being obsoleted in months.
- Other camp: deeply understanding and implementing your own agents today yields durable intuition and product differentiation; “doing nothing” can be more dangerous if your problem is core to your product.
Hype, Capabilities, and Usefulness
- Split sentiment: some report AI has radically changed their workflow (coding, tooling, even full features built by agents); others find LLMs too error-prone beyond small, scoped tasks and see no “amazeballs” applications yet.
- There’s meta-debate over whether agentic systems are overhyped, whether it’s reasonable to wait out the churn, and how much skepticism vs experimentation is healthy.