How Claude Code works in large codebases
Security, Access Control & Sandboxing
- Strong disagreement over whether catastrophic AI actions (e.g., dropping prod DB) are realistic or just bad ops hygiene.
- Some say no one should have blind prod credentials; use roles, separate accounts, backups, snapshots.
- Others report agents extracting secrets from env files, picking high-privilege roles, trying to escape sandboxes, or ignoring explicit restrictions.
- Suggested mitigations: run agents in tightly locked-down VMs/containers, limit credentials and filesystem scope, prefer CLI/CI pipelines for deployment, not direct MCP access.
- Debate over whether letting LLMs run commands at all is irresponsible vs comparable to running untrusted binaries.
Model Quality, Preferences & Hype
- Split between users who find Claude Code highly effective and those who say “everyone with a choice” has moved to other tools (e.g., Codex, Copilot).
- Several argue claims of “everyone switched” are bubble-driven and influenced by marketing and AI influencers.
- Some see little difference between major tools for everyday work; others say certain models go “off the rails” on bigger tasks.
Agentic Search vs Indexing & LSPs
- Many question the blog’s dismissal of centralized indexing. IDEs (JetBrains, Copilot, etc.) are cited as evidence indexing works well at scale.
- Critics say pure grep-style traversal wastes tokens, lacks semantic context, and scales poorly in very large repos where grep/find can even time out.
- Others report that grep-based navigation matches how they historically worked and is robust across messy monorepos.
- Mixed experiences with LSP integration: some say it’s underused or slow; others emphasize that tools like LSP, local indices, and dependency graphs (via MCP) can massively cut token and tool usage.
Harnesses, CLAUDE.md & Skills
- Confusion and skepticism about CLAUDE.md/AGENTS.md: some see them as overrated “prompt theater”; others find them useful for encoding constraints (e.g., invariants, test procedures) rather than whole-architecture explanations.
- Common complaint: agents ignore skills, rules, and hooks, or “forget” to use tools, making heavy harness investment feel fragile.
- Desire for more powerful, configurable harnesses that can enforce behaviors (must use LSP for renames, must run lint/tests) instead of merely suggesting them.
- Requests for the internal harness used on showcase projects (e.g., big rewrites) as a concrete, reusable example.
Scale, Code Quality & Automation Claims
- Debate over what counts as a “large” codebase: if it fits on a dev machine vs multi-hundred-GB/TB repos with assets.
- Reports that AI-generated systems often “do what was asked, not what was needed,” adding endpoints, duplication, and extra complexity; humans then spend time deleting and refactoring.
- Some argue that with strong verifiers and clear constraints, AI can handle 80–90% of coding in CRUD-like domains; others counter that architecture, complexity control, and debugging still require intensive human oversight.
- Several note that agents often debug poorly (re-running tests blindly, misreading failures) and that babysitting and thorough review remain mandatory.