2025-01-05

A messy experiment that changed how I think about AI code analysis

Perceived Contribution of the Technique

Many find the core idea useful: pre-structure the codebase, add higher-level context, and then have the LLM reason about code like a more experienced reviewer.
Several note this mirrors how good human reviewers triage: understand architecture and impact first, then inspect details.
Some see it as an example of “domain-specific chain-of-thought” prompting applied to code analysis.

Prompting, Planning, and Agentic Workflows

Multiple commenters already ask models to “plan first, code later” and explicitly forbid code generation until an architecture or approach is agreed.
Existing tools (AI IDEs, coding agents, code search systems) already implement variants of:
- Architecture discussion / project mapping
- Context gathering via code search and call graphs
- Multi-file editing, build/test/deploy loops

Context, Structure, and Transformer Behavior

Strong agreement that more and better-curated context dramatically improves LLM outputs.
Some push back on “context first” language, arguing transformers see the whole window at once; others respond that ordering and scaffolding still influence behavior via prompting.

Skepticism: Missing Details, Evaluation, and Hype

Key functions in the article (file grouping, context extraction) are omitted, leading to accusations of “jazz hands” and hidden “secret sauce.”
Repeated calls for:
- Actual source code, not just narratives
- Benchmarks on diverse, realistic codebases and PRs
- Metrics for correctness and significance, not just impressive anecdotes
Several see the tone as marketing-adjacent, typical of AI-hype content.

Junior vs Senior Analogy and Anthropomorphism

Heated debate over the claim that juniors read code linearly; some say this matches their early experience, others call it unrealistic and condescending.
The text was edited mid-thread, prompting questions about narrative reliability.
Many dislike anthropomorphizing AI as a “senior developer,” seeing it as misleading framing.

Real-World Use of Coding Assistants

Some report substantial productivity gains using tools like AI IDEs and agents for:
- Boilerplate, stories, tests, localization, and documentation
- Large-scale but low-conceptual work across many files
Others emphasize that such output still needs human review and often contains duplication or suboptimal patterns.

Limits: Hallucinations, APIs, and Verification

Concern that the showcased example may involve hallucinated details (e.g., fabricated PR references).
Common frustrations:
- Invented APIs and mixed framework versions
- Plausible-sounding but wrong suggestions
Suggestions include feeding concrete API docs, using RAG, and explicitly validating how often outputs are both correct and important.

Broader Reflections

Disagreement over whether tech debt is mainly a coding vs management problem.
Meta-discussion notes strong emotional reactions: some developers feel threatened or defensive; others accuse critics of Luddism.
Several see this work as one early step toward “engineering practical thinking patterns” for LLM-based tools.

Related topics