2024-11-07

Launch HN: Codebuff (YC F24) – CLI tool that writes code for you

Product & UX Overview

Codebuff is a CLI-based coding agent that can read a repo, choose relevant files, edit them, and run terminal commands/tests without manual file selection or per-command approvals.
It aims to act like a “junior engineer” or “skilled surgeon”: minimal diffs, multi-file edits, and iterative test/fix loops.
Designed to live beside any editor (VS Code, JetBrains, Neovim, Zed, etc.) in a terminal split rather than being an IDE plugin.

Comparisons to Other Tools

Repeated comparisons to Aider, Cursor, Cline, Cody, Amazon Q, etc.
Supporters highlight:
- Auto file selection, deeper context, and single-shot multi-file edits as major UX wins.
- True agent behavior (write tests → run tests → fix errors → rerun).
Critics argue:
- Aider and Cline already offer similar capabilities (repo maps, treesitter, command execution, auto-approve modes).
- Some prefer explicit file selection for safety/cost control.
- For many, IDE-based tools are more convenient than a separate CLI.

Context & Technical Approach

Uses large-context models (mainly Claude 3.5 Sonnet) plus a preprocessing pass that scans the repo (file tree, function/class names) to ask a smaller model which files to read.
Team initially thought this was “not RAG” but discussion converges that any search-then-augment flow is a form of RAG.
Supports language-aware parsing via treesitter; some languages (e.g., Svelte) only partially supported.
Earlier approach used patch generation with custom apply logic; later changed due to reliability issues.
Encourages knowledge.md files to encode project-specific conventions and style guides.

Real-World Use, Strengths, and Weaknesses

Several users report strong productivity gains on real projects (Go/TS/Terraform monorepos, Elixir, Rust, Node/TS, Flutter, Python web apps).
Especially praised for refactors, test-writing, and multi-file changes; less compelling for tiny, precise edits where IDE tools are faster.
Some reports of incorrect or incomplete edits (e.g., overwritten modules, missed subclasses), but usually caught via diffs/CI.

Pricing & Credits

Pricing is ~$99/month with a credit system; excess usage bills per credit.
Many commenters view this as expensive relative to Cursor, Cody, and roll-your-own API usage.
One user’s $500 usage stemmed from a bug that granted excessive credits; this raised concerns about runaway costs and desire for hard limits and per-request cost visibility.

Security, Privacy, and Closed-Source Concerns

Codebuff can run arbitrary shell commands without explicit confirmation, which alarms some users.
Team argues:
- In practice this has not caused serious issues.
- Git plus an internal undo can recover from destructive actions.
- Models are prompted to be cautious; directory resets try to keep commands scoped to the project.
Critics worry about:
- Potential exfiltration of SSH keys, secrets, or personal data.
- Accidental system-wide changes (e.g., Python installs, global packages).
- Lack of sandboxing/VMs and reliance on “trust the model.”
Suggestions include sandboxing (VMs, pledge-like mechanisms, Docker), optional approval prompts, and better guardrails for untrusted repos.
Hosting is via the vendor’s servers, forwarding to LLM APIs; no bring-your-own-key option today. Some dislike the closed-source, cloud-only model and prefer local or self-hosted solutions.

Positioning, Differentiation, and Skepticism

Supporters say its simplicity, no-click workflow, and aggressive context gathering make it feel qualitatively better than other agents, especially in messy or mid-size codebases.
Skeptics see “just another wrapper” around third-party models, with features already present in mature open-source tools at lower cost.
Some question long-term viability without clearer differentiation, stronger privacy guarantees, or open-sourcing.
There is debate over CLI vs IDE as the primary interface: some love the terminal-first design; others see it as friction compared to embedded IDE assistants.

Feature Requests & Future Directions

Requests include: multi-repo support beyond a single directory, better handling of giant files, sandboxed execution, local/self-hosted models, improved docs and demos on large/codebase work, and benchmarks like SWE-bench.
Team mentions plans for privacy modes, possible sandboxing, and more complex demos (including dogfooding on their own production code).

Related topics