2025-01-26

Explainer: What's r1 and everything else?

Creative writing, model size, and distillation

Several commenters focus on creative writing quality rather than math/coding.
DeepSeek-R1’s samples on a creative-writing benchmark are widely praised as unusually strong, with few “LLM quirks.”
People ask whether you can distill a huge “thinking” model like R1 into a small (e.g., 7B) model focused on writing and stripped of math/code.
Responses: you can bias/optimize toward writing via distillation, but you likely can’t “remove” math/code because reasoning skills are shared across domains.
One oddity noted: many different models independently name the main character “Rhys” in the same prompt; reason is unclear.

Reasoning, RL, and what R1 actually did

R1 is framed as showing that relatively simple reinforcement learning (RL) can drive large “reasoning” gains, versus more complex schemes like DPO or MCTS.
Others clarify that R1 combines RL with supervised fine-tuning on curated “correct” answers; later experiments suggest even the SFT part might be optional.
Multiple perspectives on “reasoning”:
- Pro: models like R1/o1/Gemini “think step by step” and achieve much better math/logic scores, so they are reasoning in a practical sense.
- Con: they are still just predicting tokens; chains-of-thought are learned patterns, not explicit logical inference, and may not match their internal decision process.

Benchmarks and ARC-AGI

The article’s claim that “crushing ARC-AGI means doing what humans do” is called a misinterpretation.
The benchmark’s creator is quoted as saying: passing shows non-zero fluid intelligence and ability to handle unfamiliar problems, but says little about how close to human intelligence the system is.
Commenters warn that misreading benchmarks is a common route to overclaiming “human-level” AI.

Exponential progress and self-improvement

The article’s flourish that AI abilities will grow “exponentially” draws substantial pushback.
Skeptical views:
- Tech progress typically follows an S-curve; LLM gains already seem to be slowing compared to early GPT jumps.
- Existing data mostly shows exponential cost/compute growth, not clearly exponential capabilities.
- Some argue “exponential” is used loosely rather than in a strict mathematical sense.
More optimistic views:
- Multiple new scaling paths (RL, data synthesis, better training) could accelerate progress beyond simple parameter scaling.
- Once AI substantially contributes to AI research and engineering, a self-improvement loop could yield exponential gains, at least for a while.
- Even without perfect exponential curves, near-term AGI is seen by some as plausible and societally transformative.

Open source, geopolitics, and competition

R1 is seen as a major open-source milestone, comparable in capability to top proprietary “reasoning” models, and valuable especially outside the US.
Some frame AI as part of a broader geopolitical “tech war” between the US, China, EU, Russia.
Others argue de-escalation would benefit everyone, and that for Europe in particular, having multiple strong global suppliers (including open-source Chinese models) is an advantage if it remains more a consumer than a producer.
There is anxiety about AI control concentrating in a few powerful private actors, and corresponding support for strong open(-ish) models to counterbalance them.

Hype, skepticism, and misc. points

Some commenters dismiss the R1 moment as incremental “hype” akin to a minor software patch; others counter that predictions of rapid progress from a few years ago have largely held up.
Clarifications:
- R1’s oft-quoted low training-cost figure is questioned; commenters note the paper doesn’t state that number and the source is unclear.
- Claims that AI is already “self-improving” are debated: current systems can help design better systems, but humans still appear to be the main driver and many bottlenecks (compute, energy, infrastructure) are external.
Several participants wish for a stable, evolving “ELI5” guide to LLM concepts and acronyms, reflecting how hard it is to keep up with the pace of change.

Related topics