A messy experiment that changed how I think about AI code analysis
Perceived Contribution of the Technique
- Many find the core idea useful: pre-structure the codebase, add higher-level context, and then have the LLM reason about code like a more experienced reviewer.
- Several note this mirrors how good human reviewers triage: understand architecture and impact first, then inspect details.
- Some see it as an example of “domain-specific chain-of-thought” prompting applied to code analysis.
Prompting, Planning, and Agentic Workflows
- Multiple commenters already ask models to “plan first, code later” and explicitly forbid code generation until an architecture or approach is agreed.
- Existing tools (AI IDEs, coding agents, code search systems) already implement variants of:
- Architecture discussion / project mapping
- Context gathering via code search and call graphs
- Multi-file editing, build/test/deploy loops
Context, Structure, and Transformer Behavior
- Strong agreement that more and better-curated context dramatically improves LLM outputs.
- Some push back on “context first” language, arguing transformers see the whole window at once; others respond that ordering and scaffolding still influence behavior via prompting.
Skepticism: Missing Details, Evaluation, and Hype
- Key functions in the article (file grouping, context extraction) are omitted, leading to accusations of “jazz hands” and hidden “secret sauce.”
- Repeated calls for:
- Actual source code, not just narratives
- Benchmarks on diverse, realistic codebases and PRs
- Metrics for correctness and significance, not just impressive anecdotes
- Several see the tone as marketing-adjacent, typical of AI-hype content.
Junior vs Senior Analogy and Anthropomorphism
- Heated debate over the claim that juniors read code linearly; some say this matches their early experience, others call it unrealistic and condescending.
- The text was edited mid-thread, prompting questions about narrative reliability.
- Many dislike anthropomorphizing AI as a “senior developer,” seeing it as misleading framing.
Real-World Use of Coding Assistants
- Some report substantial productivity gains using tools like AI IDEs and agents for:
- Boilerplate, stories, tests, localization, and documentation
- Large-scale but low-conceptual work across many files
- Others emphasize that such output still needs human review and often contains duplication or suboptimal patterns.
Limits: Hallucinations, APIs, and Verification
- Concern that the showcased example may involve hallucinated details (e.g., fabricated PR references).
- Common frustrations:
- Invented APIs and mixed framework versions
- Plausible-sounding but wrong suggestions
- Suggestions include feeding concrete API docs, using RAG, and explicitly validating how often outputs are both correct and important.
Broader Reflections
- Disagreement over whether tech debt is mainly a coding vs management problem.
- Meta-discussion notes strong emotional reactions: some developers feel threatened or defensive; others accuse critics of Luddism.
- Several see this work as one early step toward “engineering practical thinking patterns” for LLM-based tools.