Why Cline doesn't index your codebase
Terminology: What Counts as RAG?
- Several commenters argue that “search and feed the context window” is still RAG in the original sense: retrieval + augmentation + generation.
- Others note that in industry practice “RAG” has become shorthand for vector DB + embeddings + similarity search, making the term overloaded or even “borderline useless.”
- There’s some pedantry backlash: these terms are new, evolving, and people care more about behavior than labels.
Structured Retrieval vs Vector Embeddings for Code
- Cline’s approach is described as structured retrieval: filesystem traversal, AST parsing, following imports/dependencies, and reading files in logical order.
- Proponents say vector similarity often grabs keyword-adjacent but logically irrelevant fragments, whereas code-structure–guided retrieval better matches how developers actually navigate.
- Some engineers working on similar tools report shelving vector-based code RAG because chunking + similarity search proved too lossy/fuzzy and biased toward misleading but “similar” snippets.
Arguments Against Codebase Indexing / Vector RAG
- Critiques include: extra complexity, stale indexes, privacy/security issues, token bloat from imprecise chunks, and the belief that large context windows plus good tools make indexing less necessary.
- For code specifically, people point out that syntax/grammar and explicit references (definitions, calls, scopes) remove much of the need for generic text chunking.
Counterarguments: Why Indexing Still Matters
- Power users with huge, mixed repos (code + large amounts of documentation, DB schemas, Swagger specs, API docs) say indexing is a “killer feature” that Cline is missing.
- They argue:
- Indexing gives the model a “foot in the door”; from the first hit, the agent can then read more context.
- Tools like Cursor, Augment, and others do dynamic indexing and privacy modes today; “it’s hard” isn’t a convincing excuse.
- RAG is a technique, not tied to embeddings only; it can incorporate ASTs, graphs, repo maps, or summaries.
Tools, UX, and Quality Comparisons
- Cline receives strong praise as an agentic coder, especially with open-source transparency and direct use of provider API keys.
- Others prefer Claude Code, Cursor, or Augment, claiming fewer prompts and better results, and noting Cursor’s inline autocomplete as a big differentiator.
- Aider is highlighted for repo maps and explicit, user-controlled context selection.
Large Context Windows and Performance
- Some say 1M-token contexts (e.g., Gemini 2.5) make traditional RAG less necessary and unlock qualitatively new workflows.
- Others cite empirical experience and papers: model quality degrades long before max context, so careful retrieval/chunking still matters.
Security, Performance, and Marketing Skepticism
- Security benefits of not indexing are questioned if prompts still transit the vendor’s servers (e.g., via credit systems).
- Some readers see the blog post as a marketing/positioning piece, possibly overconfident and light on rigorous metrics, and speculate it may be reactionary to competing tools adding indexers.