2025-05-21

LLM function calls don't scale; code orchestration is simpler, more effective

Hybrid orchestration vs direct function calls

Many argue for a hybrid model: use deterministic code for as much as possible, and LLMs only where specs are fuzzy or high‑level (e.g., planning, mapping natural language to APIs).
One pattern: have the LLM generate deterministic code (or reusable “tools”), validate it, then reuse that code as the stable path going forward.
Others note that over‑agentic systems (e.g., smolagents-style everything-through-the-model) add a lot of complexity and are often overkill; simple function composition and structured outputs are usually easier to reason about.

State, execution environments, and durability

Long‑running, multi‑step workflows need durable, stateless-but-persistent execution (event sourcing, durable execution) rather than ad‑hoc Jupyter‑like stateful kernels.
Handling mid‑execution failures is hard: people want the LLM to “resume” with the right variable state; building the runtime and state machine around this is nontrivial.
Latency becomes a concern when chaining many steps or graph-based orchestrations.

MCP/tool design and data formats

A recurring complaint is that MCP tools often just wrap APIs and return big JSON blobs, wasting context and bandwidth and mixing irrelevant fields.
Some suggest flattening/ filtering responses or using GraphQL-style selective fields, or even alternative formats (XML, markdown tables, narrative text) which models often handle better than large JSON.
Others note MCP’s return types are very limited (text/image), and the protocol/tooling feel under-designed and already somewhat fragmented.

Reliability, probabilistic behavior, and correctness

Concerns that layering probabilistic components causes error rates to compound; “good enough most of the time” is unacceptable for domains like tax or financial dashboards.
Others counter that if a model can actually solve a deterministic task, it can assign near‑certain probability; instability on complex tasks is a capability issue, not “probabilistic” per se.
Output-aware inference (dynamic grammars, constraining outputs to valid IDs or tools) is proposed as a way to prevent certain classes of hallucinations, though wrong answers would still occur.

Agents, codegen, and the DSL view

Several see “advanced agents” as effectively building a DSL and orchestrator: LLM designs algorithms in terms of an API, but deterministic code executes them.
Ideas: LLM writes code that calls MCPs as ordinary functions; or dynamically composes new tools from smaller ones. In practice, function calling and codegen are still brittle and require heavy testing and deployment infrastructure.
Some argue the only scalable path is to push as much granularity and decision logic into deterministic “decisional systems,” with LLMs as language interfaces.

Tooling, sandboxing, and security

Examples of orchestration frameworks (e.g., internal tools, Roast, smolagents) show how to embed nondeterministic LLM steps into larger deterministic workflows.
Questions remain around sandboxed execution (Docker, gVisor, SaaS sandboxes), and securely exposing tools with OAuth or API keys while ensuring LLMs/agent code never see secrets directly.

Skepticism, hype, and rediscovering old ideas

Some commenters are baffled by the complexity and see much of this as over‑engineering or “madness,” driven by hype and investor pressure.
Others note we are largely rediscovering traditional CS concepts—schemas, determinism, state machines, memory management—and reapplying them to LLM systems.
There’s acknowledgement of genuinely useful applications, but also a sense that the field is still early, brittle, and often reinventing decades-old patterns.

Related topics