LLM Daydreaming

Daydreaming Loop & User Limitations

  • Several comments like the idea of an autonomous “daydreaming loop” that searches for non-obvious connections between facts.
  • People note that most real-world prompts (e.g., code assistance) are not structured to surface genuine novelty, and even when they are, most users can’t reliably recognize a “breakthrough” in the output.
  • Some early experiments (e.g., dreamGPT) attempt autonomous idea generation and divergence scoring without user prompts.

Reinforcement of Consensus vs. Novelty

  • LLMs often mirror dominant opinions in their training data, reinforcing existing views and discouraging further search for alternatives.
  • This is seen as “System 1 to the extreme”: models follow the user’s reasoning, rarely push back, and compress away nuance.

Have LLMs Made Breakthroughs?

  • One side insists no clear, attributable LLM-originated breakthrough exists; marketing claims like “PhD-level” are criticized as equivocal.
  • Others argue breakthroughs might be happening but not credited to the model (e.g., code, research hints quietly used by humans). Skeptics call this implausible or conspiratorial.
  • Some point to AI-assisted advances (chip design, protein folding, math/algo results) as counterexamples, though often not purely LLM-based.

Critic, Novelty, and Evaluation Problems

  • The hardest step in the daydream loop is a “critic” that reliably filters for genuinely valuable or novel ideas.
  • Attempts where an LLM evaluates its own or another model’s ideas often degrade performance: systems overfit to the critic, which itself reasons poorly.
  • External critics like compilers, test suites, theorem provers, or objective benchmarks (e.g., “beats current SOTA”) work in narrow domains but don’t generalize to open-ended science, theory, or prose.
  • Novelty is inherently murky: most human “breakthroughs” are incremental or recombinatory, and attribution is hard.

Reasoning, Background Thinking & Agency

  • “Reasoning models” and test-time adaptation are discussed; empirical evidence suggests multi-step reasoning traces can improve accuracy, but they don’t fix hallucinations or guarantee deeper insight.
  • Critics argue LLMs lack agency, curiosity, continual learning, and real-world experimentation—key ingredients for human breakthroughs.
  • Some propose always-on, experience-fed, memory-bearing loops as closer to human daydreaming, but note cost, verification, and safety issues.

Philosophical & Long-Run Views

  • Several comments frame this as a sign we don’t yet understand human creativity or reasoning well enough to formalize it.
  • Others expect eventual hybrid systems (LLM + tools + human experts + RL) to find cross-disciplinary, economically valuable ideas once evaluation and novelty metrics improve.