The deep learning boom caught almost everyone by surprise

GPU history and “invention” debate

  • Strong disagreement over the claim that one company “invented the GPU.”
  • Some argue earlier hardware (3dfx, SGI, Evans & Sutherland) and pre‑CUDA accelerators already counted.
  • Others say early chips weren’t truly general-purpose or useful for deep learning, and that the term “GPU” itself was a later marketing definition.
  • Consensus: the history is gradual and messy; calling any single launch “the invention” is seen as oversimplified.

Three pillars of the deep learning boom

  • Many commenters endorse the triplet:
    • Scalable neural networks and backpropagation.
    • Massively parallel GPU compute, programmable via CUDA.
    • Large labeled datasets like ImageNet, enabled by crowd labeling.
  • Some add: better activations (ReLU and successors), initialization, normalization, residual connections, and optimizers were also critical.

Data, scaling, and changing ML culture

  • Several note that earlier eras had tiny datasets and far less compute; “more data” wasn’t always practically feasible.
  • Pre‑ImageNet culture emphasized clever models and priors over sheer scale, especially in academia.
  • Others push back on a quote that “people did not believe in data,” arguing practitioners always valued more data but were resource‑constrained.

Deep learning vs other ML methods

  • Some wonder what would have happened if similar resources had gone into SVMs, random forests, etc.
  • Replies: classical methods struggle to scale, are less flexible to compose, and were heavily explored from the ’90s to mid‑2010s.
  • A recurring theme: LLMs are often used for trivial tasks where simpler models would be cheaper and adequate.

AGI, intelligence, and common sense

  • Split between those who see current progress as making AGI within years feel “inevitable” and those who think present systems fall far short.
  • Skeptics stress missing “common sense,” sample efficiency, and robust causal reasoning; draw contrasts with animals’ rapid generalization.
  • Debates over whether evolution counts as “training data,” how much biology’s architecture matters, and whether scaling deep learning alone can reach general intelligence.

Surprise vs inevitability

  • Some researchers say the boom felt more like “finally crossing a known threshold” than a true surprise.
  • Others emphasize how counterintuitive many specific breakthroughs were and how few predicted the scale or speed of the current wave.