The deep learning boom caught almost everyone by surprise
GPU history and “invention” debate
- Strong disagreement over the claim that one company “invented the GPU.”
- Some argue earlier hardware (3dfx, SGI, Evans & Sutherland) and pre‑CUDA accelerators already counted.
- Others say early chips weren’t truly general-purpose or useful for deep learning, and that the term “GPU” itself was a later marketing definition.
- Consensus: the history is gradual and messy; calling any single launch “the invention” is seen as oversimplified.
Three pillars of the deep learning boom
- Many commenters endorse the triplet:
- Scalable neural networks and backpropagation.
- Massively parallel GPU compute, programmable via CUDA.
- Large labeled datasets like ImageNet, enabled by crowd labeling.
- Some add: better activations (ReLU and successors), initialization, normalization, residual connections, and optimizers were also critical.
Data, scaling, and changing ML culture
- Several note that earlier eras had tiny datasets and far less compute; “more data” wasn’t always practically feasible.
- Pre‑ImageNet culture emphasized clever models and priors over sheer scale, especially in academia.
- Others push back on a quote that “people did not believe in data,” arguing practitioners always valued more data but were resource‑constrained.
Deep learning vs other ML methods
- Some wonder what would have happened if similar resources had gone into SVMs, random forests, etc.
- Replies: classical methods struggle to scale, are less flexible to compose, and were heavily explored from the ’90s to mid‑2010s.
- A recurring theme: LLMs are often used for trivial tasks where simpler models would be cheaper and adequate.
AGI, intelligence, and common sense
- Split between those who see current progress as making AGI within years feel “inevitable” and those who think present systems fall far short.
- Skeptics stress missing “common sense,” sample efficiency, and robust causal reasoning; draw contrasts with animals’ rapid generalization.
- Debates over whether evolution counts as “training data,” how much biology’s architecture matters, and whether scaling deep learning alone can reach general intelligence.
Surprise vs inevitability
- Some researchers say the boom felt more like “finally crossing a known threshold” than a true surprise.
- Others emphasize how counterintuitive many specific breakthroughs were and how few predicted the scale or speed of the current wave.