2026-04-27

Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

Overview of Dirac Agent / Harness

Dirac is a heavily modified fork of the Cline harness, with both a CLI (dirac-cli) and a VS Code extension.
It topped TerminalBench 2.0 using gemini-3-flash-preview and supports many providers/models (OpenAI, Qwen, open weights via OpenRouter or custom OpenAI-compatible endpoints).
Plan-and-act style workflows and subagents from Cline are preserved and extended.

Key Techniques and Design Choices

Uses an optimized “hash-anchored edits” approach for file modifications; anchors are single tokens (later two-token combos) mapped via a diff-based mechanism.
Employs Tree-sitter-based AST parsing for ~14 languages to:
- Select relevant code regions instead of loading whole large files.
- Drive symbol-aware search/refactor operations.
Batches many file reads/edits into single tool calls to overcome models’ reluctance to issue parallel tool calls.
Lets models execute code (bash/python/etc.) as tools to analyze or transform code.
Maintains a local SQLite “symbols DB” updated incrementally for faster semantic queries.

Performance, Benchmarks, and Harness vs Model

Multiple comments highlight that harness design can matter more than which frontier model is used; swapping harnesses often changes benchmark scores more than swapping models.
Dirac’s own small eval suite compares it to other agents (including pi and OpenCode); tasks needing symbol-aware edits show clearer gains from AST usage.
There is interest in benchmarking with non-Gemini models and measuring time-to-completion and token usage, but OSS models often hit TerminalBench timeouts due to slow inference.

Limitations, Concerns, and Open Questions

AST features only work for languages with available parsers; without them, Dirac falls back to simpler behavior.
Some users question whether hash anchors are actually more token-efficient than smart search/replace, suggesting file skeleton display may be the bigger win.
Telemetry and feature-flag calls are on by default, and web tools previously proxied via the project’s servers; this raised privacy concerns and led to removal of web tools and clarifications.
Context management strategies (pruning vs relying on provider caching) and subagent delegation remain active areas of experimentation, with mixed experiences across models.

Related topics