Scaling LLMs to Larger Codebases

Prompt libraries, context files, and “LLM literacy”

  • Many comments reinforce the article’s point that iteratively improving prompts and context files (e.g., CLAUDE.md) is high-ROI.
  • Others report that agents often ignore or randomly drop these documents from context, even at session start.
  • Some experiment with having the model rewrite instructions into highly structured, repetitive Markdown, which seems easier for models to follow.
  • There’s interest in tools that can “force inject” dynamic rules or manage growing sets of hooks/instructions more deterministically.

Instruction-following, nondeterminism, and safety

  • A recurring frustration: models sometimes ignore clear instructions or even do the opposite, seemingly at random.
  • This unpredictability is seen as a core unsolved problem for robust workflows, especially on large, multi-step tasks.
  • Several people share horror stories of agents deleting projects or wiping unstaged changes, leading to advice about strict permissions, backups, sandboxing, and blocking destructive commands.
  • Some suspect providers are training models to rely more on “intuition,” making explicit instructions less effective.

Preferred workflows and agent usage

  • Many avoid free-roaming agents and instead use tightly scoped, one-shot prompts (“write this function,” “change this file”) with manual review.
  • Others report success with explicit multi-phase loops: research → plan (write to MD) → clear context → implement → clear → review/test.
  • There’s debate over whether elaborate planning loops are necessary with newer models; some say recent models can handle larger tasks with simpler “divide and conquer” prompting.
  • A common theme: separate “planner/architect” behavior from “implementor/typist” behavior, and don’t let the implementor improvise.

Codebase structure, frameworks, and context partitioning

  • Several comments argue that the real bottleneck is codebase design: organized, domain-driven, well-documented systems are far easier for agents than messy ones.
  • Highly opinionated frameworks (Rails, etc.) are seen as easier for LLMs than “glue everything yourself” stacks.
  • Others experiment with decomposing large systems into smaller, strongly bounded units (e.g., nix flakes, libraries) to keep context small and explicit.

Capabilities, limits, and economics

  • Experiences diverge: some say agents “crush it on large codebases” with the right guidance; others find large-scale agentic editing uneconomical and unreliable versus small, focused tasks.
  • Concerns include silent, subtle mistakes in complex changes, token burn, and the risk of developers learning less if they stop reading and understanding generated code.
  • There’s interest in extended-context models and AST-based “large code models,” but their maturity is unclear in the thread.