I read all of Cloudflare's Claude-generated commits

AI Progress: Inevitable Curve or Local Maxima?

  • Some argue continued LLM improvement is effectively inevitable: more compute, optimization techniques, and un-deployed research keep pushing capabilities.
  • Others say progress is now mostly incremental: better benchmarks and demos, but fundamental issues (reasoning, hallucinations) aren’t clearly improving.
  • There’s a split between people who want “better tools” vs those expecting full SWE replacement; the latter see little foundational progress in recent years.

Coding Agents in Real Use

  • Several commenters report substantial real-world use: large services or toy engines built “almost entirely” by AI, with humans designing APIs, nudging architecture, and fixing edge cases.
  • Others find agents brittle for maintenance tasks (e.g., framework upgrades) and say repeated handholding erodes the promised productivity.
  • Many agree AI is best at mid-level, boilerplate, or “rote” code; humans still make key design decisions and must review output line-by-line.

Prompts as Source Code / LLM-as-Compiler

  • The article’s idea of treating prompts as the canonical source is heavily criticized:
    • Natural language is ambiguous and under-specified compared to programming languages.
    • Hosted models are non-deterministic and change over time, breaking reproducibility.
    • You lose navigability (jump-to-definition, usages) if only prompts are versioned.
  • More moderate proposals:
    • Commit both code and prompts; treat prompts as documentation or “literate” context.
    • Use prompts + comprehensive tests so future, better models can regenerate parts of a system.
    • Store prompts in commit/PR descriptions rather than pretending they are the sole source.

Correctness, Hallucinations, and Verification

  • Long subthread over what “hallucination” means: fabricated APIs vs any semantically wrong-but-compiling code.
  • Agent loops with compilers/linters catch some issues (e.g., nonexistent methods) but not incorrect behavior.
  • Tests, linters, formal methods, etc. are seen as necessary but insufficient—same as with human-written code.
  • Some argue LLM bugs are qualitatively different from human bugs; others insist they’re just “more bugs” and should be judged by the same bar.

Managing Generated Code

  • Experience from non-AI generators: mixing generated and hand-written code in one repo is painful; you need clear separation and mechanisms to inject manual logic.
  • Strong consensus that not storing generated code at all is risky with today’s nondeterministic models; prompts alone are not a reliable build input.

IP, Plagiarism, and Legal Risk

  • Concern that AI-written corporate code could unknowingly plagiarize GPL or other licensed code; people note the Cloudflare review mentions RFCs but not license checks.
  • Some shrug (“no one cares; vendors indemnify us”); others predict a future landmark lawsuit that will force clearer rules.
  • Practitioners say most AI output is generic “mid” code heavily shaped by their prompts, making exact-copy plagiarism unlikely in typical use.

Careers, Learning, and the Nature of Work

  • One camp worries AI will massively boost senior productivity and reduce demand for juniors, undermining the training pipeline.
  • Another expects the opposite: more features → more revenue → more hiring; juniors can also learn from AI, much as they once learned from mediocre human mentors.
  • Several say heavy AI use changes the job into supervising and debugging a stochastic tool; some find this exciting, others describe it as “miserable” and alienating.

Meta: AI Rhetoric and Article Style

  • Some readers see “AI smell” in the blog’s phrasing—grandiose claims about “new creative dynamics” and anthropomorphizing tools as “improving themselves.”
  • The author later confirms using an LLM to polish human notes, which reinforces both the stylistic suspicion and the idea that AI is already shaping technical discourse itself.