2025-05-27

Why Cline doesn't index your codebase

Terminology: What Counts as RAG?

Several commenters argue that “search and feed the context window” is still RAG in the original sense: retrieval + augmentation + generation.
Others note that in industry practice “RAG” has become shorthand for vector DB + embeddings + similarity search, making the term overloaded or even “borderline useless.”
There’s some pedantry backlash: these terms are new, evolving, and people care more about behavior than labels.

Structured Retrieval vs Vector Embeddings for Code

Cline’s approach is described as structured retrieval: filesystem traversal, AST parsing, following imports/dependencies, and reading files in logical order.
Proponents say vector similarity often grabs keyword-adjacent but logically irrelevant fragments, whereas code-structure–guided retrieval better matches how developers actually navigate.
Some engineers working on similar tools report shelving vector-based code RAG because chunking + similarity search proved too lossy/fuzzy and biased toward misleading but “similar” snippets.

Arguments Against Codebase Indexing / Vector RAG

Critiques include: extra complexity, stale indexes, privacy/security issues, token bloat from imprecise chunks, and the belief that large context windows plus good tools make indexing less necessary.
For code specifically, people point out that syntax/grammar and explicit references (definitions, calls, scopes) remove much of the need for generic text chunking.

Counterarguments: Why Indexing Still Matters

Power users with huge, mixed repos (code + large amounts of documentation, DB schemas, Swagger specs, API docs) say indexing is a “killer feature” that Cline is missing.
They argue:
- Indexing gives the model a “foot in the door”; from the first hit, the agent can then read more context.
- Tools like Cursor, Augment, and others do dynamic indexing and privacy modes today; “it’s hard” isn’t a convincing excuse.
- RAG is a technique, not tied to embeddings only; it can incorporate ASTs, graphs, repo maps, or summaries.

Tools, UX, and Quality Comparisons

Cline receives strong praise as an agentic coder, especially with open-source transparency and direct use of provider API keys.
Others prefer Claude Code, Cursor, or Augment, claiming fewer prompts and better results, and noting Cursor’s inline autocomplete as a big differentiator.
Aider is highlighted for repo maps and explicit, user-controlled context selection.

Large Context Windows and Performance

Some say 1M-token contexts (e.g., Gemini 2.5) make traditional RAG less necessary and unlock qualitatively new workflows.
Others cite empirical experience and papers: model quality degrades long before max context, so careful retrieval/chunking still matters.

Security, Performance, and Marketing Skepticism

Security benefits of not indexing are questioned if prompts still transit the vendor’s servers (e.g., via credit systems).
Some readers see the blog post as a marketing/positioning piece, possibly overconfident and light on rigorous metrics, and speculate it may be reactionary to competing tools adding indexers.

Related topics