Explainer: What's r1 and everything else?
Creative writing, model size, and distillation
- Several commenters focus on creative writing quality rather than math/coding.
- DeepSeek-R1’s samples on a creative-writing benchmark are widely praised as unusually strong, with few “LLM quirks.”
- People ask whether you can distill a huge “thinking” model like R1 into a small (e.g., 7B) model focused on writing and stripped of math/code.
- Responses: you can bias/optimize toward writing via distillation, but you likely can’t “remove” math/code because reasoning skills are shared across domains.
- One oddity noted: many different models independently name the main character “Rhys” in the same prompt; reason is unclear.
Reasoning, RL, and what R1 actually did
- R1 is framed as showing that relatively simple reinforcement learning (RL) can drive large “reasoning” gains, versus more complex schemes like DPO or MCTS.
- Others clarify that R1 combines RL with supervised fine-tuning on curated “correct” answers; later experiments suggest even the SFT part might be optional.
- Multiple perspectives on “reasoning”:
- Pro: models like R1/o1/Gemini “think step by step” and achieve much better math/logic scores, so they are reasoning in a practical sense.
- Con: they are still just predicting tokens; chains-of-thought are learned patterns, not explicit logical inference, and may not match their internal decision process.
Benchmarks and ARC-AGI
- The article’s claim that “crushing ARC-AGI means doing what humans do” is called a misinterpretation.
- The benchmark’s creator is quoted as saying: passing shows non-zero fluid intelligence and ability to handle unfamiliar problems, but says little about how close to human intelligence the system is.
- Commenters warn that misreading benchmarks is a common route to overclaiming “human-level” AI.
Exponential progress and self-improvement
- The article’s flourish that AI abilities will grow “exponentially” draws substantial pushback.
- Skeptical views:
- Tech progress typically follows an S-curve; LLM gains already seem to be slowing compared to early GPT jumps.
- Existing data mostly shows exponential cost/compute growth, not clearly exponential capabilities.
- Some argue “exponential” is used loosely rather than in a strict mathematical sense.
- More optimistic views:
- Multiple new scaling paths (RL, data synthesis, better training) could accelerate progress beyond simple parameter scaling.
- Once AI substantially contributes to AI research and engineering, a self-improvement loop could yield exponential gains, at least for a while.
- Even without perfect exponential curves, near-term AGI is seen by some as plausible and societally transformative.
Open source, geopolitics, and competition
- R1 is seen as a major open-source milestone, comparable in capability to top proprietary “reasoning” models, and valuable especially outside the US.
- Some frame AI as part of a broader geopolitical “tech war” between the US, China, EU, Russia.
- Others argue de-escalation would benefit everyone, and that for Europe in particular, having multiple strong global suppliers (including open-source Chinese models) is an advantage if it remains more a consumer than a producer.
- There is anxiety about AI control concentrating in a few powerful private actors, and corresponding support for strong open(-ish) models to counterbalance them.
Hype, skepticism, and misc. points
- Some commenters dismiss the R1 moment as incremental “hype” akin to a minor software patch; others counter that predictions of rapid progress from a few years ago have largely held up.
- Clarifications:
- R1’s oft-quoted low training-cost figure is questioned; commenters note the paper doesn’t state that number and the source is unclear.
- Claims that AI is already “self-improving” are debated: current systems can help design better systems, but humans still appear to be the main driver and many bottlenecks (compute, energy, infrastructure) are external.
- Several participants wish for a stable, evolving “ELI5” guide to LLM concepts and acronyms, reflecting how hard it is to keep up with the pace of change.