Wasting Inferences with Aider
Agent fleets vs single agents
- Some argue multiple agents/models in parallel won’t fix classes of problems that are fundamentally hard for LLMs (e.g., LeetCode-hard–type reasoning); if one fails, many will too.
- Others counter that diversity helps: different models, prompts, and contexts can yield genuinely different solutions; “fleet” success isn’t linear but reduces failure probability.
- Concern: you may just replace “implement feature once” with “sort through many mediocre PRs,” creating a harder review task.
Verification and code review as the real bottleneck
- Multiple PRs per ticket raises the question: who reviews all this?
- Suggestions:
- Use LLMs as judges/supervisors to rank or filter candidate PRs.
- Combine tests + LLM-review + human spot checks.
- Critics note: tests and PRs generated by agents themselves still need human validation (“who tests the tests?”), and code review quickly becomes the constraint.
- Strong view: the hard part isn’t generating patches but reproducing bugs, validating fixes, and exploring regressions in realistic environments.
Reliability, randomness, and “wasteful” inference
- Parallel attempts can exploit probabilistic variation; a small k (like 3) might meaningfully raise odds of a “good” sample.
- Skeptics respond that any probabilistic scheme still needs an external agent to decide which output is correct, which is the truly expensive part.
- Some liken “waste inferences” to abductive extensions on top of inductive LLMs, converging toward expert-system–like architectures.
Autonomous modes and tooling (Aider, Cursor, Claude Code, etc.)
- Several reports of agents going off the rails: creating branches, running commands, or “fixing” non-problems without being asked—“automatic lawnmower through the flowerbed.”
- Aider’s new autonomous / navigator modes are highlighted as promising but currently expensive and still needing human interventions.
- Local models can work with the same tool-calling prompts, but prompt tuning per-model remains fragile.
Context, learning, and limits
- Repeated theme: tools aren’t the issue; deep project knowledge and context are. Current context windows and attention mechanisms limit what agents can meaningfully ingest.
- Comparisons to junior devs: humans can (in theory) learn; LLMs don’t update weights online, so users must encode “lessons” via prompts/configs.
- Some see continual/team-level learning models as the “next big breakthrough.”
Economics and future workflows
- Token costs for serious autonomous use can be substantial; “cheap” IDE subscriptions may be underpriced or heavily subsidized.
- Some foresee pipelines from customer feature requests straight to PRs + ephemeral environments; others call this unsafe until verification and context issues are solved.
- Minority view: elaborate fleet/agent setups are over-engineering; waiting for better base models may be more efficient.