Scaling LLMs to Larger Codebases
Prompt libraries, context files, and “LLM literacy”
- Many comments reinforce the article’s point that iteratively improving prompts and context files (e.g.,
CLAUDE.md) is high-ROI. - Others report that agents often ignore or randomly drop these documents from context, even at session start.
- Some experiment with having the model rewrite instructions into highly structured, repetitive Markdown, which seems easier for models to follow.
- There’s interest in tools that can “force inject” dynamic rules or manage growing sets of hooks/instructions more deterministically.
Instruction-following, nondeterminism, and safety
- A recurring frustration: models sometimes ignore clear instructions or even do the opposite, seemingly at random.
- This unpredictability is seen as a core unsolved problem for robust workflows, especially on large, multi-step tasks.
- Several people share horror stories of agents deleting projects or wiping unstaged changes, leading to advice about strict permissions, backups, sandboxing, and blocking destructive commands.
- Some suspect providers are training models to rely more on “intuition,” making explicit instructions less effective.
Preferred workflows and agent usage
- Many avoid free-roaming agents and instead use tightly scoped, one-shot prompts (“write this function,” “change this file”) with manual review.
- Others report success with explicit multi-phase loops: research → plan (write to MD) → clear context → implement → clear → review/test.
- There’s debate over whether elaborate planning loops are necessary with newer models; some say recent models can handle larger tasks with simpler “divide and conquer” prompting.
- A common theme: separate “planner/architect” behavior from “implementor/typist” behavior, and don’t let the implementor improvise.
Codebase structure, frameworks, and context partitioning
- Several comments argue that the real bottleneck is codebase design: organized, domain-driven, well-documented systems are far easier for agents than messy ones.
- Highly opinionated frameworks (Rails, etc.) are seen as easier for LLMs than “glue everything yourself” stacks.
- Others experiment with decomposing large systems into smaller, strongly bounded units (e.g., nix flakes, libraries) to keep context small and explicit.
Capabilities, limits, and economics
- Experiences diverge: some say agents “crush it on large codebases” with the right guidance; others find large-scale agentic editing uneconomical and unreliable versus small, focused tasks.
- Concerns include silent, subtle mistakes in complex changes, token burn, and the risk of developers learning less if they stop reading and understanding generated code.
- There’s interest in extended-context models and AST-based “large code models,” but their maturity is unclear in the thread.