'Attention is all you need' coauthor says he's 'sick' of transformers
Dominance of Transformers and Research Monoculture
- Several comments argue that transformers’ success has created an unhealthy monoculture: funding, conferences, and PhD work overwhelmingly chase incremental transformer gains instead of exploring other paradigms.
- One analogy compares this to the entire food industry deciding to only improve hamburgers; another frames it as an imbalance between “exploration vs. exploitation.”
- Others counter that this is just natural selection in research: the approach that works best (right now) wins attention and resources.
How Transformative Have Transformers Been?
- Supporters say transformers have radically changed NLP, genomics, protein structure prediction (e.g., AlphaFold), drug discovery, computer vision, search/answer engines, and developer workflows.
- Some practitioners describe LLM coding assistants as personally “transformative,” turning stressful workloads into mostly AI-assisted implementation.
- Critics claim impacts in their own fields are “mostly negative,” with transformers driving distraction, noise, and shallow work rather than genuine scientific progress.
Slop, Spam, and Societal Harms
- A recurring theme: transformers drastically lower the cost of producing plausible but wrong or low‑quality content (“slop”).
- People highlight spam, scams, propaganda, astroturfing, robocalls, and degraded student learning as domains where LLMs currently excel.
- Others argue models can also be used to filter and analyze such content, but acknowledge that incentives currently favor mass low-quality generation.
Architecture Debates and Alternatives
- Some view transformers as an especially successful instance of a broader class (probabilistic models over sequences/graphs) and expect future gains from combining them with older ideas (PGMs, symbolic reasoning, causal inference).
- Others emphasize architectural limits: softmax pathologies, attention “sinks,” positional encoding quirks, and scaling/energy costs. Various papers and ideas (e.g., alternative attention mechanisms, hyper-graph models, BDH) are mentioned as promising.
- A minority is skeptical that a radically new architecture is the key; they see more upside in better training paradigms (e.g., reinforcement learning, data efficiency) than in replacing transformers.
AGI, Deduction, and Cognition
- Some argue transformers are fundamentally inductive and can’t truly perform deduction without external tools; others respond that stochasticity doesn’t preclude deductive reasoning in principle.
- A long subthread debates whether LLM capabilities imply “nothing special” about the human brain vs. the view that human cognition is grounded in desire, embodiment, and neurobiology in ways transformers do not capture.
- There’s disagreement over whether LLM-generated work is genuinely “original” or just sophisticated plagiarism, and whether hallucination makes them categorically unlike human reasoning or just a noisier analogue.
Research Culture, Incentives, and Productization
- Commenters note short project horizons (e.g., 3-month cycles) aimed at top conferences and benchmarks, favoring shoddy but fast incremental work.
- Much of what the public sees as “AI” is described as 90% product engineering (RLHF, prompt design, UX) built on a small core of foundational research.
- True non-transformer research is perceived as a small, underfunded fraction, overshadowed by the “tsunami of money” for transformer-based products.
Hardware, Energy, and Lock‑In
- Transformers are praised for aligning extremely well with parallel GPU hardware, in contrast to RNNs; this hardware match is seen as a major reason they won.
- Some worry that massive investment in GPU-style infra could trap the field on suboptimal architectures; others say good parallel algorithms are inherently superior, and hardware will evolve with any better approach.
- Energy use and data center build‑out are flagged as looming constraints; some hope this will force more fundamental innovation.
Reactions to the Sakana CTO’s Anti‑Transformer Stance
- Some dismiss the “sick of transformers” line as fundraising theater—positioning around “the next big thing” without specifying it.
- Others see it as a normal researcher reaction: once a technique is “solved” and industrialized, curious people move on to more open problems.
- A few compare this to artists abandoning a popular style, driven by boredom, stress, or ambition rather than purely by money.