Naur's "Programming as Theory Building" and LLMs replacing human programmers
Framing of the Article and Naur Reference
- Several commenters think the title and opening “I will commit a fallacy” framing are distracting or weak; they expected a more rigorous or differently focused argument.
- Naur’s “Programming as Theory Building” is widely respected, but some feel the essay overstates its implications for LLMs or misuses the “theory” concept.
- One point of confusion: is the argument about what LLMs can do now or what they can ever do in principle?
Do LLMs Build Theories or Only Mimic Them?
- Core claim challenged: “LLMs only ingest outputs; theories come from doing the work.”
- Critics call this anthropocentric: humans build theories via experience, but that doesn’t prove other mechanisms can’t yield equivalent internal structures.
- Supporters respond that human theory-building is empirically validated (science, engineering), whereas LLMs often drift or fail after a few reasoning steps, especially on novel code and libraries.
- Some note that people can learn and wield theories they didn’t personally develop, so “doing the work” may not be strictly necessary.
Memory, Context, and Architectural Limits
- A frequent argument against deep program understanding: most codebases exceed context windows; any internal “theory” must compete with limited tokens.
- Proposals: multiple contexts, separate reasoning windows, multi-model systems, or long‑term external memory.
- Skeptics say current LLMs only have short-term context and no continual weight updates, so persistent theory-building is blocked in practice.
Chinese Room, Intentionality, and Understanding
- Lengthy debate about the Chinese Room thought experiment and whether it shows that symbolic manipulation (or LLM text generation) lacks genuine understanding.
- One side: LLMs and computers lack intentionality; their symbols “aren’t about anything,” so they can’t have a theory in Naur’s sense.
- Other side: the Chinese Room is outdated/weak; if a system reliably solves math, language, and code tasks, denying it “understanding” is just redefining the word.
- Some note that the argument risks applying equally to human brains (neurons individually don’t “understand” either).
LLMs as Programming Tools vs Programmer Replacements
- Broad agreement: today’s LLMs cannot fully replace human programmers, especially for architecture, adaptation, and debugging subtle edge cases.
- Many find them highly useful for code generation, refactoring, tests, documentation, and “English → code” translation under human supervision.
- Reports of productivity gains range from modest (~5%) with autocomplete-style tools to large (claimed 2–3×) with chat/agent workflows in IDEs; others see far less benefit.
- Recurring failure modes: hallucinated APIs, brittle reasoning across large systems, inability to explain or change code based on deep “why” constraints.
Bootstrapping, Theory-Building, and Future Trajectories
- Some argue an agentic LLM that iteratively edits code, runs it, talks to stakeholders, and stores/recalls project knowledge could bootstrap a “theory” over time.
- Others counter that without continuous learning at the model level, such systems are just elaborate prompt-engineering plus short-term recall.
- There is optimism that better memory, tool use, and training (e.g., on synthetic “reasoning traces”) will move LLMs from zero to “some” theory-building ability, even if far from human.
- A minority insists the article’s strong impossibility claims are unjustified given our incomplete understanding of both human minds and LLM internals.
Meta-Points and Philosophical Disagreements
- Several commenters criticize the essay for relying on vague, contested notions of “mind,” “theory,” and “reasoning” without concrete proof.
- Others defend the value of philosophical analysis and thought experiments (like Naur, Ryle, Searle), but recognize they don’t settle empirical questions about future AI.
- Overall, the thread splits between:
- those who see LLMs as sophisticated text/grammar machines, powerful assistants but non-theoretical; and
- those who see early, imperfect forms of reasoning/theory-building that may scale, making sharp “LLMs can never do X” claims premature.