LLM function calls don't scale; code orchestration is simpler, more effective

Hybrid orchestration vs direct function calls

  • Many argue for a hybrid model: use deterministic code for as much as possible, and LLMs only where specs are fuzzy or high‑level (e.g., planning, mapping natural language to APIs).
  • One pattern: have the LLM generate deterministic code (or reusable “tools”), validate it, then reuse that code as the stable path going forward.
  • Others note that over‑agentic systems (e.g., smolagents-style everything-through-the-model) add a lot of complexity and are often overkill; simple function composition and structured outputs are usually easier to reason about.

State, execution environments, and durability

  • Long‑running, multi‑step workflows need durable, stateless-but-persistent execution (event sourcing, durable execution) rather than ad‑hoc Jupyter‑like stateful kernels.
  • Handling mid‑execution failures is hard: people want the LLM to “resume” with the right variable state; building the runtime and state machine around this is nontrivial.
  • Latency becomes a concern when chaining many steps or graph-based orchestrations.

MCP/tool design and data formats

  • A recurring complaint is that MCP tools often just wrap APIs and return big JSON blobs, wasting context and bandwidth and mixing irrelevant fields.
  • Some suggest flattening/ filtering responses or using GraphQL-style selective fields, or even alternative formats (XML, markdown tables, narrative text) which models often handle better than large JSON.
  • Others note MCP’s return types are very limited (text/image), and the protocol/tooling feel under-designed and already somewhat fragmented.

Reliability, probabilistic behavior, and correctness

  • Concerns that layering probabilistic components causes error rates to compound; “good enough most of the time” is unacceptable for domains like tax or financial dashboards.
  • Others counter that if a model can actually solve a deterministic task, it can assign near‑certain probability; instability on complex tasks is a capability issue, not “probabilistic” per se.
  • Output-aware inference (dynamic grammars, constraining outputs to valid IDs or tools) is proposed as a way to prevent certain classes of hallucinations, though wrong answers would still occur.

Agents, codegen, and the DSL view

  • Several see “advanced agents” as effectively building a DSL and orchestrator: LLM designs algorithms in terms of an API, but deterministic code executes them.
  • Ideas: LLM writes code that calls MCPs as ordinary functions; or dynamically composes new tools from smaller ones. In practice, function calling and codegen are still brittle and require heavy testing and deployment infrastructure.
  • Some argue the only scalable path is to push as much granularity and decision logic into deterministic “decisional systems,” with LLMs as language interfaces.

Tooling, sandboxing, and security

  • Examples of orchestration frameworks (e.g., internal tools, Roast, smolagents) show how to embed nondeterministic LLM steps into larger deterministic workflows.
  • Questions remain around sandboxed execution (Docker, gVisor, SaaS sandboxes), and securely exposing tools with OAuth or API keys while ensuring LLMs/agent code never see secrets directly.

Skepticism, hype, and rediscovering old ideas

  • Some commenters are baffled by the complexity and see much of this as over‑engineering or “madness,” driven by hype and investor pressure.
  • Others note we are largely rediscovering traditional CS concepts—schemas, determinism, state machines, memory management—and reapplying them to LLM systems.
  • There’s acknowledgement of genuinely useful applications, but also a sense that the field is still early, brittle, and often reinventing decades-old patterns.