2025-04-07

LLMs understand nullability

What “understanding” means for LLMs

Large part of the thread disputes whether LLMs can be said to “understand” anything at all.
One camp: LLMs are just next-token predictors, like thermostats or photoreceptors; there is no mechanism for understanding or consciousness, so applying that word is misleading or wrong.
Opposing camp: if a system consistently gives correct, context-sensitive answers, that’s functionally what we call “understanding” in humans; judging internal state is impossible for both brains and models, so insisting on a metaphysical distinction is empty semantics.
Several comments note we lack precise, agreed scientific definitions of “understanding,” “intelligence,” and “consciousness,” making these discussions circular.

Brain vs LLM analogies

Some argue the brain may itself be a kind of very large stochastic model; others respond that this analogy is too shallow, ignoring biology, embodiment, and non-linguistic cognition.
Disagreement over whether future “true thinking” systems will look like scaled-up LLMs or require a fundamentally different architecture.
Concern voiced that anthropomorphizing models (comparing them to humans) is dangerous, especially when used for high-stakes tasks like medical diagnosis.

Nullability, code, and the experiment itself

Many find the visualization and probing of a “nullability direction” in embedding space very cool: subtracting averaged states reveals a linear axis corresponding to nullable vs non-nullable.
There’s interest in composing this with other typing tools, especially focusing on interfaces/behaviors (duck typing) rather than concrete types.
Some note that static type checkers already handle nullability well, so the value here is more about understanding how models internally represent code concepts, not adding new capabilities.
One commenter links this work to similar findings of single “directions” for refusals/jailbreaking in safety research.

Reliability, evaluation, and limits

Several people push for more rigorous reporting: showing probabilities over multiple runs rather than anecdotal “eventually it learns X,” given LLM output variance.
Others emphasize that LLMs can reflect correct patterns for concepts like nullability because they’ve seen vast text/corpus coverage, not because they’ve executed programs.
Critics argue models often fail at “simple but novel” code manipulations where a human programmer would generalize from semantics rather than surface patterns, suggesting a shallow form of competence.

Broader capability and hype

Some see LLMs as a remarkable, surprising capability jump that already warrants the word “AI”; others view them as sophisticated autocomplete with overblown claims of understanding.
There is shared fatigue over repeatedly re-litigating the same philosophical issues, with some proposing to avoid the verb “understand” entirely and instead talk in terms of “accuracy on tasks” and “capabilities over distributions of inputs.”

Related topics