Launch HN: Codebuff (YC F24) – CLI tool that writes code for you
Product & UX Overview
- Codebuff is a CLI-based coding agent that can read a repo, choose relevant files, edit them, and run terminal commands/tests without manual file selection or per-command approvals.
- It aims to act like a “junior engineer” or “skilled surgeon”: minimal diffs, multi-file edits, and iterative test/fix loops.
- Designed to live beside any editor (VS Code, JetBrains, Neovim, Zed, etc.) in a terminal split rather than being an IDE plugin.
Comparisons to Other Tools
- Repeated comparisons to Aider, Cursor, Cline, Cody, Amazon Q, etc.
- Supporters highlight:
- Auto file selection, deeper context, and single-shot multi-file edits as major UX wins.
- True agent behavior (write tests → run tests → fix errors → rerun).
- Critics argue:
- Aider and Cline already offer similar capabilities (repo maps, treesitter, command execution, auto-approve modes).
- Some prefer explicit file selection for safety/cost control.
- For many, IDE-based tools are more convenient than a separate CLI.
Context & Technical Approach
- Uses large-context models (mainly Claude 3.5 Sonnet) plus a preprocessing pass that scans the repo (file tree, function/class names) to ask a smaller model which files to read.
- Team initially thought this was “not RAG” but discussion converges that any search-then-augment flow is a form of RAG.
- Supports language-aware parsing via treesitter; some languages (e.g., Svelte) only partially supported.
- Earlier approach used patch generation with custom apply logic; later changed due to reliability issues.
- Encourages
knowledge.mdfiles to encode project-specific conventions and style guides.
Real-World Use, Strengths, and Weaknesses
- Several users report strong productivity gains on real projects (Go/TS/Terraform monorepos, Elixir, Rust, Node/TS, Flutter, Python web apps).
- Especially praised for refactors, test-writing, and multi-file changes; less compelling for tiny, precise edits where IDE tools are faster.
- Some reports of incorrect or incomplete edits (e.g., overwritten modules, missed subclasses), but usually caught via diffs/CI.
Pricing & Credits
- Pricing is ~$99/month with a credit system; excess usage bills per credit.
- Many commenters view this as expensive relative to Cursor, Cody, and roll-your-own API usage.
- One user’s $500 usage stemmed from a bug that granted excessive credits; this raised concerns about runaway costs and desire for hard limits and per-request cost visibility.
Security, Privacy, and Closed-Source Concerns
- Codebuff can run arbitrary shell commands without explicit confirmation, which alarms some users.
- Team argues:
- In practice this has not caused serious issues.
- Git plus an internal undo can recover from destructive actions.
- Models are prompted to be cautious; directory resets try to keep commands scoped to the project.
- Critics worry about:
- Potential exfiltration of SSH keys, secrets, or personal data.
- Accidental system-wide changes (e.g., Python installs, global packages).
- Lack of sandboxing/VMs and reliance on “trust the model.”
- Suggestions include sandboxing (VMs, pledge-like mechanisms, Docker), optional approval prompts, and better guardrails for untrusted repos.
- Hosting is via the vendor’s servers, forwarding to LLM APIs; no bring-your-own-key option today. Some dislike the closed-source, cloud-only model and prefer local or self-hosted solutions.
Positioning, Differentiation, and Skepticism
- Supporters say its simplicity, no-click workflow, and aggressive context gathering make it feel qualitatively better than other agents, especially in messy or mid-size codebases.
- Skeptics see “just another wrapper” around third-party models, with features already present in mature open-source tools at lower cost.
- Some question long-term viability without clearer differentiation, stronger privacy guarantees, or open-sourcing.
- There is debate over CLI vs IDE as the primary interface: some love the terminal-first design; others see it as friction compared to embedded IDE assistants.
Feature Requests & Future Directions
- Requests include: multi-repo support beyond a single directory, better handling of giant files, sandboxed execution, local/self-hosted models, improved docs and demos on large/codebase work, and benchmarks like SWE-bench.
- Team mentions plans for privacy modes, possible sandboxing, and more complex demos (including dogfooding on their own production code).