2025-04-28

Naur's "Programming as Theory Building" and LLMs replacing human programmers

Framing of the Article and Naur Reference

Several commenters think the title and opening “I will commit a fallacy” framing are distracting or weak; they expected a more rigorous or differently focused argument.
Naur’s “Programming as Theory Building” is widely respected, but some feel the essay overstates its implications for LLMs or misuses the “theory” concept.
One point of confusion: is the argument about what LLMs can do now or what they can ever do in principle?

Do LLMs Build Theories or Only Mimic Them?

Core claim challenged: “LLMs only ingest outputs; theories come from doing the work.”
Critics call this anthropocentric: humans build theories via experience, but that doesn’t prove other mechanisms can’t yield equivalent internal structures.
Supporters respond that human theory-building is empirically validated (science, engineering), whereas LLMs often drift or fail after a few reasoning steps, especially on novel code and libraries.
Some note that people can learn and wield theories they didn’t personally develop, so “doing the work” may not be strictly necessary.

Memory, Context, and Architectural Limits

A frequent argument against deep program understanding: most codebases exceed context windows; any internal “theory” must compete with limited tokens.
Proposals: multiple contexts, separate reasoning windows, multi-model systems, or long‑term external memory.
Skeptics say current LLMs only have short-term context and no continual weight updates, so persistent theory-building is blocked in practice.

Chinese Room, Intentionality, and Understanding

Lengthy debate about the Chinese Room thought experiment and whether it shows that symbolic manipulation (or LLM text generation) lacks genuine understanding.
One side: LLMs and computers lack intentionality; their symbols “aren’t about anything,” so they can’t have a theory in Naur’s sense.
Other side: the Chinese Room is outdated/weak; if a system reliably solves math, language, and code tasks, denying it “understanding” is just redefining the word.
Some note that the argument risks applying equally to human brains (neurons individually don’t “understand” either).

LLMs as Programming Tools vs Programmer Replacements

Broad agreement: today’s LLMs cannot fully replace human programmers, especially for architecture, adaptation, and debugging subtle edge cases.
Many find them highly useful for code generation, refactoring, tests, documentation, and “English → code” translation under human supervision.
Reports of productivity gains range from modest (~5%) with autocomplete-style tools to large (claimed 2–3×) with chat/agent workflows in IDEs; others see far less benefit.
Recurring failure modes: hallucinated APIs, brittle reasoning across large systems, inability to explain or change code based on deep “why” constraints.

Bootstrapping, Theory-Building, and Future Trajectories

Some argue an agentic LLM that iteratively edits code, runs it, talks to stakeholders, and stores/recalls project knowledge could bootstrap a “theory” over time.
Others counter that without continuous learning at the model level, such systems are just elaborate prompt-engineering plus short-term recall.
There is optimism that better memory, tool use, and training (e.g., on synthetic “reasoning traces”) will move LLMs from zero to “some” theory-building ability, even if far from human.
A minority insists the article’s strong impossibility claims are unjustified given our incomplete understanding of both human minds and LLM internals.

Meta-Points and Philosophical Disagreements

Several commenters criticize the essay for relying on vague, contested notions of “mind,” “theory,” and “reasoning” without concrete proof.
Others defend the value of philosophical analysis and thought experiments (like Naur, Ryle, Searle), but recognize they don’t settle empirical questions about future AI.
Overall, the thread splits between:
- those who see LLMs as sophisticated text/grammar machines, powerful assistants but non-theoretical; and
- those who see early, imperfect forms of reasoning/theory-building that may scale, making sharp “LLMs can never do X” claims premature.

Related topics