Why Cline doesn't index your codebase

Terminology: What Counts as RAG?

  • Several commenters argue that “search and feed the context window” is still RAG in the original sense: retrieval + augmentation + generation.
  • Others note that in industry practice “RAG” has become shorthand for vector DB + embeddings + similarity search, making the term overloaded or even “borderline useless.”
  • There’s some pedantry backlash: these terms are new, evolving, and people care more about behavior than labels.

Structured Retrieval vs Vector Embeddings for Code

  • Cline’s approach is described as structured retrieval: filesystem traversal, AST parsing, following imports/dependencies, and reading files in logical order.
  • Proponents say vector similarity often grabs keyword-adjacent but logically irrelevant fragments, whereas code-structure–guided retrieval better matches how developers actually navigate.
  • Some engineers working on similar tools report shelving vector-based code RAG because chunking + similarity search proved too lossy/fuzzy and biased toward misleading but “similar” snippets.

Arguments Against Codebase Indexing / Vector RAG

  • Critiques include: extra complexity, stale indexes, privacy/security issues, token bloat from imprecise chunks, and the belief that large context windows plus good tools make indexing less necessary.
  • For code specifically, people point out that syntax/grammar and explicit references (definitions, calls, scopes) remove much of the need for generic text chunking.

Counterarguments: Why Indexing Still Matters

  • Power users with huge, mixed repos (code + large amounts of documentation, DB schemas, Swagger specs, API docs) say indexing is a “killer feature” that Cline is missing.
  • They argue:
    • Indexing gives the model a “foot in the door”; from the first hit, the agent can then read more context.
    • Tools like Cursor, Augment, and others do dynamic indexing and privacy modes today; “it’s hard” isn’t a convincing excuse.
    • RAG is a technique, not tied to embeddings only; it can incorporate ASTs, graphs, repo maps, or summaries.

Tools, UX, and Quality Comparisons

  • Cline receives strong praise as an agentic coder, especially with open-source transparency and direct use of provider API keys.
  • Others prefer Claude Code, Cursor, or Augment, claiming fewer prompts and better results, and noting Cursor’s inline autocomplete as a big differentiator.
  • Aider is highlighted for repo maps and explicit, user-controlled context selection.

Large Context Windows and Performance

  • Some say 1M-token contexts (e.g., Gemini 2.5) make traditional RAG less necessary and unlock qualitatively new workflows.
  • Others cite empirical experience and papers: model quality degrades long before max context, so careful retrieval/chunking still matters.

Security, Performance, and Marketing Skepticism

  • Security benefits of not indexing are questioned if prompts still transit the vendor’s servers (e.g., via credit systems).
  • Some readers see the blog post as a marketing/positioning piece, possibly overconfident and light on rigorous metrics, and speculate it may be reactionary to competing tools adding indexers.