The real data wall is billions of years of evolution
Compute, Data, and Model Architecture
- Several argue current progress is driven primarily by massive compute and memory bandwidth, with data already covering “things people talk about” well.
- Others stress architectural advances (convolutions, transformers, longer context windows, better filtering) can yield big gains without more data, and we are far from an “optimal model wall.”
- Some see “intelligence” as high sample efficiency: doing more with less data, partly via better information filtering and compression.
Evolution, DNA, and What’s Really Learned
- One camp agrees that evolution provides powerful “pre-programming,” but says this is mainly architecture/sensors/organism design, not literal stored “training data.”
- Critics say treating billions of years of evolution as something akin to GPT-style pretraining data is misleading or “Lamarckian”; evolution shapes structure and instincts, not direct experiential memories.
- Others counter that, in a broad sense, evolution itself is a learning process over genes and environments, so calling that “data” is reasonable, though details remain unclear.
Embodiment, Sensory Data, and Grounding
- Many emphasize that humans learn through long sensorimotor interaction with the real world (childhood, bodily symmetry, multi-sensory integration), giving grounded causal intuition text-only LLMs lack.
- Blind/deaf humans are cited both as evidence that no single modality is essential and as support for rich multimodal pretraining.
- Some suggest the true “data wall” is the massive, continuous, embodied experience from infancy onward.
Language, Culture, and Social Learning
- A strong thread holds that the key differentiator is language and culture, not DNA alone: symbolic communication enables cumulative, cross-generational search and refinement.
- Society is framed as a third learning timescale beyond evolution and individual experience.
Robots, Real-World Data, and Future Directions
- When text runs out, many expect robots and embodied agents to generate new data via experiments, though real-world trials are slower and failures costlier.
- Ideas include fleets of cheap, robust robots sharing experience, evolutionary search over architectures, and multi-agent AI systems that talk to each other and perhaps develop their own “languages.”
AGI, Hype, and Limits of LLMs
- Some insist LLMs do not work like brains and over-analogizing is harmful or hype-inducing; others say biological inspiration is still useful despite limited understanding.
- There is disagreement over whether we are “eons away” from passing meaningful Turing tests or already close with focused fine-tuning.
- Several worry about overhype leading to another AI winter, urging focus on realistic, non-AGI applications.