Launch HN: Codebuff (YC F24) – CLI tool that writes code for you

Product & UX Overview

  • Codebuff is a CLI-based coding agent that can read a repo, choose relevant files, edit them, and run terminal commands/tests without manual file selection or per-command approvals.
  • It aims to act like a “junior engineer” or “skilled surgeon”: minimal diffs, multi-file edits, and iterative test/fix loops.
  • Designed to live beside any editor (VS Code, JetBrains, Neovim, Zed, etc.) in a terminal split rather than being an IDE plugin.

Comparisons to Other Tools

  • Repeated comparisons to Aider, Cursor, Cline, Cody, Amazon Q, etc.
  • Supporters highlight:
    • Auto file selection, deeper context, and single-shot multi-file edits as major UX wins.
    • True agent behavior (write tests → run tests → fix errors → rerun).
  • Critics argue:
    • Aider and Cline already offer similar capabilities (repo maps, treesitter, command execution, auto-approve modes).
    • Some prefer explicit file selection for safety/cost control.
    • For many, IDE-based tools are more convenient than a separate CLI.

Context & Technical Approach

  • Uses large-context models (mainly Claude 3.5 Sonnet) plus a preprocessing pass that scans the repo (file tree, function/class names) to ask a smaller model which files to read.
  • Team initially thought this was “not RAG” but discussion converges that any search-then-augment flow is a form of RAG.
  • Supports language-aware parsing via treesitter; some languages (e.g., Svelte) only partially supported.
  • Earlier approach used patch generation with custom apply logic; later changed due to reliability issues.
  • Encourages knowledge.md files to encode project-specific conventions and style guides.

Real-World Use, Strengths, and Weaknesses

  • Several users report strong productivity gains on real projects (Go/TS/Terraform monorepos, Elixir, Rust, Node/TS, Flutter, Python web apps).
  • Especially praised for refactors, test-writing, and multi-file changes; less compelling for tiny, precise edits where IDE tools are faster.
  • Some reports of incorrect or incomplete edits (e.g., overwritten modules, missed subclasses), but usually caught via diffs/CI.

Pricing & Credits

  • Pricing is ~$99/month with a credit system; excess usage bills per credit.
  • Many commenters view this as expensive relative to Cursor, Cody, and roll-your-own API usage.
  • One user’s $500 usage stemmed from a bug that granted excessive credits; this raised concerns about runaway costs and desire for hard limits and per-request cost visibility.

Security, Privacy, and Closed-Source Concerns

  • Codebuff can run arbitrary shell commands without explicit confirmation, which alarms some users.
  • Team argues:
    • In practice this has not caused serious issues.
    • Git plus an internal undo can recover from destructive actions.
    • Models are prompted to be cautious; directory resets try to keep commands scoped to the project.
  • Critics worry about:
    • Potential exfiltration of SSH keys, secrets, or personal data.
    • Accidental system-wide changes (e.g., Python installs, global packages).
    • Lack of sandboxing/VMs and reliance on “trust the model.”
  • Suggestions include sandboxing (VMs, pledge-like mechanisms, Docker), optional approval prompts, and better guardrails for untrusted repos.
  • Hosting is via the vendor’s servers, forwarding to LLM APIs; no bring-your-own-key option today. Some dislike the closed-source, cloud-only model and prefer local or self-hosted solutions.

Positioning, Differentiation, and Skepticism

  • Supporters say its simplicity, no-click workflow, and aggressive context gathering make it feel qualitatively better than other agents, especially in messy or mid-size codebases.
  • Skeptics see “just another wrapper” around third-party models, with features already present in mature open-source tools at lower cost.
  • Some question long-term viability without clearer differentiation, stronger privacy guarantees, or open-sourcing.
  • There is debate over CLI vs IDE as the primary interface: some love the terminal-first design; others see it as friction compared to embedded IDE assistants.

Feature Requests & Future Directions

  • Requests include: multi-repo support beyond a single directory, better handling of giant files, sandboxed execution, local/self-hosted models, improved docs and demos on large/codebase work, and benchmarks like SWE-bench.
  • Team mentions plans for privacy modes, possible sandboxing, and more complex demos (including dogfooding on their own production code).