Claude's Cycles [pdf]

Overview of the Result

  • The paper describes how a reasoning-focused language model, guided by a human collaborator, explored many programmatic approaches and eventually discovered an algorithm that solved an open combinatorial problem for all odd cases.
  • The human then proved correctness and wrote up the formal math; the even case remains unsolved.

Was This Genuine Novelty?

  • Some commenters assert the model must have simply regurgitated part of its training set; others counter that:
    • The problem was presented as open in the literature.
    • The successful approach emerged only after ~30 failed explorations.
    • The model refined and reused earlier partial ideas, suggesting genuine search rather than memorization.
  • Several note that if this were a known solution, it likely would have appeared immediately, not after a long iterative search.

What This Suggests About LLM Capabilities

  • Many see this as strong evidence of nontrivial problem-solving: pattern search, hypothesis generation, code synthesis, and refinement under feedback.
  • Others emphasize the human–model synergy: the person chose directions, restarted when outputs degraded, and translated the final algorithm into a proof.
  • There is debate over whether this counts as “thinking” or simply “very powerful next-token prediction plus good tooling.”

Intelligence, Memory, and Learning

  • Long back-and-forth on whether models that can’t update their weights at inference time are truly “intelligent,” with analogies to human amnesia and external memory tools.
  • Some argue that adding tool use, external memory, and agents on top of a base model can approximate long-term learning; others insist this remains fundamentally different from self-updating cognition.

Keeping Models Up to Date

  • Concern about models as “time capsules” with fixed knowledge cutoffs.
  • Discussion of:
    • Continual training vs. continual learning in-context.
    • Huge context windows, compaction, and the “dumb zone” when too much prior detail is lost.
    • Using user interactions and reasoning traces as future training data, with attendant privacy and consent worries.

Broader Implications and Skepticism

  • Enthusiasts see this as an early sign that hard open problems (including in physics or pure math) might fall to similar approaches.
  • Skeptics stress current systems still make silly errors, struggle with many novel problems, and rely heavily on human steering.
  • Ethical concerns arise around surveillance, concentration of power, and the future role of human cognitive labor.