2025-06-06

I read all of Cloudflare's Claude-generated commits

AI Progress: Inevitable Curve or Local Maxima?

Some argue continued LLM improvement is effectively inevitable: more compute, optimization techniques, and un-deployed research keep pushing capabilities.
Others say progress is now mostly incremental: better benchmarks and demos, but fundamental issues (reasoning, hallucinations) aren’t clearly improving.
There’s a split between people who want “better tools” vs those expecting full SWE replacement; the latter see little foundational progress in recent years.

Coding Agents in Real Use

Several commenters report substantial real-world use: large services or toy engines built “almost entirely” by AI, with humans designing APIs, nudging architecture, and fixing edge cases.
Others find agents brittle for maintenance tasks (e.g., framework upgrades) and say repeated handholding erodes the promised productivity.
Many agree AI is best at mid-level, boilerplate, or “rote” code; humans still make key design decisions and must review output line-by-line.

Prompts as Source Code / LLM-as-Compiler

The article’s idea of treating prompts as the canonical source is heavily criticized:
- Natural language is ambiguous and under-specified compared to programming languages.
- Hosted models are non-deterministic and change over time, breaking reproducibility.
- You lose navigability (jump-to-definition, usages) if only prompts are versioned.
More moderate proposals:
- Commit both code and prompts; treat prompts as documentation or “literate” context.
- Use prompts + comprehensive tests so future, better models can regenerate parts of a system.
- Store prompts in commit/PR descriptions rather than pretending they are the sole source.

Correctness, Hallucinations, and Verification

Long subthread over what “hallucination” means: fabricated APIs vs any semantically wrong-but-compiling code.
Agent loops with compilers/linters catch some issues (e.g., nonexistent methods) but not incorrect behavior.
Tests, linters, formal methods, etc. are seen as necessary but insufficient—same as with human-written code.
Some argue LLM bugs are qualitatively different from human bugs; others insist they’re just “more bugs” and should be judged by the same bar.

Managing Generated Code

Experience from non-AI generators: mixing generated and hand-written code in one repo is painful; you need clear separation and mechanisms to inject manual logic.
Strong consensus that not storing generated code at all is risky with today’s nondeterministic models; prompts alone are not a reliable build input.

IP, Plagiarism, and Legal Risk

Concern that AI-written corporate code could unknowingly plagiarize GPL or other licensed code; people note the Cloudflare review mentions RFCs but not license checks.
Some shrug (“no one cares; vendors indemnify us”); others predict a future landmark lawsuit that will force clearer rules.
Practitioners say most AI output is generic “mid” code heavily shaped by their prompts, making exact-copy plagiarism unlikely in typical use.

Careers, Learning, and the Nature of Work

One camp worries AI will massively boost senior productivity and reduce demand for juniors, undermining the training pipeline.
Another expects the opposite: more features → more revenue → more hiring; juniors can also learn from AI, much as they once learned from mediocre human mentors.
Several say heavy AI use changes the job into supervising and debugging a stochastic tool; some find this exciting, others describe it as “miserable” and alienating.

Meta: AI Rhetoric and Article Style

Some readers see “AI smell” in the blog’s phrasing—grandiose claims about “new creative dynamics” and anthropomorphizing tools as “improving themselves.”
The author later confirms using an LLM to polish human notes, which reinforces both the stylistic suspicion and the idea that AI is already shaping technical discourse itself.

Related topics