Can LLMs write better code if you keep asking them to “write better code”?
Variation in Code Quality Across Languages & Domains
- Experiences vary widely by language: good results reported for Arduino, Python, web frontend; poor for Ruby, Rust, Android/Kotlin, and some OpenSCAD tasks.
- Models often produce “beginner”/tutorial-style code, pick outdated or inappropriate libraries, and use deprecated APIs unless guided.
- Some see this as a sensible default for novice users; others say it makes LLM-written code unusable without strong prior expertise.
How People Actually Use LLMs
- Productive uses: autocomplete (e.g., Copilot), boilerplate, small utilities, unit tests, refactors, and rubber-ducking/brainstorming.
- Several treat LLMs as “brilliant but unreliable interns” or “professors on office hours”: great for ideas, not for paste-in code.
- Others rely heavily on them for unfamiliar stacks to build working prototypes much faster, accepting extra review and fixes.
Iterative Improvement & “Write Better Code”
- Many confirm that iterative refinement (“improve this”, “optimize this”, add tests, run, repeat) yields substantially better code.
- However, simply asking “write better code” can:
- Help converge toward more efficient or structured solutions, or
- Degrade working code, especially when no tests are enforced.
- Human reviewers often find simpler, more impactful optimizations than the model, highlighting the need for human judgment.
Execution, Testing, and Tooling
- Core limitation noted: base LLMs cannot natively run arbitrary code; they “fly blind” without an external sandbox.
- Multiple tools/agents (IDE integrations, Aider, Cursor, Devin, Gemini/Claude/ChatGPT code interpreters) run code, read compiler/test output, and loop automatically.
- Strong view that serious agents must operate inside the developer’s environment and under version control (e.g., via git).
What “Better Code” Means
- Disagreement over metrics: speed vs readability vs simplicity vs maintainability.
- Some criticize optimizing toy Python tasks as misleading; they’d prefer idiomatic, clear code unless profiling shows a bottleneck.
- Others value LLMs for quickly finding performance tricks once the problem and benchmarks are well specified.
Capabilities, Limits, and Prompting
- Debate over whether LLMs “think” or merely pattern-match; some argue they learn real algorithms and world models, others insist they’re stochastic parrots.
- Prompting strategies that often help: ask for architecture/plan first, specify libraries/versions, ask for pitfalls, or require tests and type annotations.
- Emotional or threatening prompts sometimes appear to improve effort, but many see this as unreliable “prompt voodoo” rather than principled control.