2024-11-06

The deep learning boom caught almost everyone by surprise

Original Article ↗ Hacker News Discussion ↗

GPU history and “invention” debate

Strong disagreement over the claim that one company “invented the GPU.”
Some argue earlier hardware (3dfx, SGI, Evans & Sutherland) and pre‑CUDA accelerators already counted.
Others say early chips weren’t truly general-purpose or useful for deep learning, and that the term “GPU” itself was a later marketing definition.
Consensus: the history is gradual and messy; calling any single launch “the invention” is seen as oversimplified.

Three pillars of the deep learning boom

Many commenters endorse the triplet:
- Scalable neural networks and backpropagation.
- Massively parallel GPU compute, programmable via CUDA.
- Large labeled datasets like ImageNet, enabled by crowd labeling.
Some add: better activations (ReLU and successors), initialization, normalization, residual connections, and optimizers were also critical.

Data, scaling, and changing ML culture

Several note that earlier eras had tiny datasets and far less compute; “more data” wasn’t always practically feasible.
Pre‑ImageNet culture emphasized clever models and priors over sheer scale, especially in academia.
Others push back on a quote that “people did not believe in data,” arguing practitioners always valued more data but were resource‑constrained.

Deep learning vs other ML methods

Some wonder what would have happened if similar resources had gone into SVMs, random forests, etc.
Replies: classical methods struggle to scale, are less flexible to compose, and were heavily explored from the ’90s to mid‑2010s.
A recurring theme: LLMs are often used for trivial tasks where simpler models would be cheaper and adequate.

AGI, intelligence, and common sense

Split between those who see current progress as making AGI within years feel “inevitable” and those who think present systems fall far short.
Skeptics stress missing “common sense,” sample efficiency, and robust causal reasoning; draw contrasts with animals’ rapid generalization.
Debates over whether evolution counts as “training data,” how much biology’s architecture matters, and whether scaling deep learning alone can reach general intelligence.

Surprise vs inevitability

Some researchers say the boom felt more like “finally crossing a known threshold” than a true surprise.
Others emphasize how counterintuitive many specific breakthroughs were and how few predicted the scale or speed of the current wave.