LLM Daydreaming
Daydreaming Loop & User Limitations
- Several comments like the idea of an autonomous “daydreaming loop” that searches for non-obvious connections between facts.
- People note that most real-world prompts (e.g., code assistance) are not structured to surface genuine novelty, and even when they are, most users can’t reliably recognize a “breakthrough” in the output.
- Some early experiments (e.g., dreamGPT) attempt autonomous idea generation and divergence scoring without user prompts.
Reinforcement of Consensus vs. Novelty
- LLMs often mirror dominant opinions in their training data, reinforcing existing views and discouraging further search for alternatives.
- This is seen as “System 1 to the extreme”: models follow the user’s reasoning, rarely push back, and compress away nuance.
Have LLMs Made Breakthroughs?
- One side insists no clear, attributable LLM-originated breakthrough exists; marketing claims like “PhD-level” are criticized as equivocal.
- Others argue breakthroughs might be happening but not credited to the model (e.g., code, research hints quietly used by humans). Skeptics call this implausible or conspiratorial.
- Some point to AI-assisted advances (chip design, protein folding, math/algo results) as counterexamples, though often not purely LLM-based.
Critic, Novelty, and Evaluation Problems
- The hardest step in the daydream loop is a “critic” that reliably filters for genuinely valuable or novel ideas.
- Attempts where an LLM evaluates its own or another model’s ideas often degrade performance: systems overfit to the critic, which itself reasons poorly.
- External critics like compilers, test suites, theorem provers, or objective benchmarks (e.g., “beats current SOTA”) work in narrow domains but don’t generalize to open-ended science, theory, or prose.
- Novelty is inherently murky: most human “breakthroughs” are incremental or recombinatory, and attribution is hard.
Reasoning, Background Thinking & Agency
- “Reasoning models” and test-time adaptation are discussed; empirical evidence suggests multi-step reasoning traces can improve accuracy, but they don’t fix hallucinations or guarantee deeper insight.
- Critics argue LLMs lack agency, curiosity, continual learning, and real-world experimentation—key ingredients for human breakthroughs.
- Some propose always-on, experience-fed, memory-bearing loops as closer to human daydreaming, but note cost, verification, and safety issues.
Philosophical & Long-Run Views
- Several comments frame this as a sign we don’t yet understand human creativity or reasoning well enough to formalize it.
- Others expect eventual hybrid systems (LLM + tools + human experts + RL) to find cross-disciplinary, economically valuable ideas once evaluation and novelty metrics improve.