2024-09-28

If AI seems smarter, it's thanks to smarter human trainers

Focus of model improvements: compute, data, and techniques

Several comments argue that increased compute is the main current driver of frontier model gains, enabling more experiments, larger runs, and advanced techniques (e.g., RL-style “reasoning traces,” synthetic fine‑tuning).
Others note that human feedback is itself a training technique, and architectures, data curation, and training methods all matter alongside compute.
There’s debate on allocation of R&D among data, architecture, training, compute, and “other,” with a strong short‑term bias toward compute.

Human trainers, data quality, and synthetic data

The thread pushes back on the idea that “smarter trainers” alone explain progress; better architectures, techniques, and filtering (e.g., pre‑filtering junk data) are emphasized.
Some expect the “human‑labeled data is better” argument to weaken as synthetic data and better annotation tools improve.
A detailed anecdote from an AI trainer describes creating reasoning‑heavy benchmark questions; many “failures” of models were mundane (outdated info, tokenization issues), and models were often better than project organizers at spotting contradictions.

Capabilities, limitations, and evaluation

Many see current AI as augmenting and redistributing human expertise rather than replacing it.
Several complain that people fixate on failures and “hallucinations,” while others insist that occasional correctness doesn’t mean a system truly “can do” a task, given unpredictable errors.
IQ tests and puzzle‑style word problems are discussed as highly “learnable,” not reliable measures of deep intelligence.
Benchmarks that demand nuanced, framework‑dependent answers are seen as inherently tricky to automate.

Economic and ecosystem trends

Foundational models are seen as on a path to commoditization; differentiation will come from domain expertise and applications (e.g., dev tools, cybersecurity).
Some worry that giant data centers and compute budgets will make meaningful startup competition physically impossible.
Others see large opportunity in domain‑specific tools and synthetic data generation, especially where experts can script data generators.

Ethics, bias, and law

Multiple comments stress that AI systems inherit human flaws: racism, misogyny, and structural limitations of law and institutions.
Removing biased training data is viewed as necessary but insufficient; many domains have no clear, perfectly mechanizable ground truth.

User data, privacy, and opt‑out

Some users resist using free tools (e.g., chatbots) over fears their contributions will be used to train models, especially if they’re also paying.
There’s debate about corporate promises not to train on user data: some see such assurances as reputationally binding, others as unverifiable and economically tempting to break.
Policies differ by provider and product; what data is actually used for training remains somewhat unclear.

Human intelligence, expertise, and “hallucinations”

Several comments argue that many humans, including degree‑holders, fail at basic critical thinking or trick questions, so AI mistakes shouldn’t be held to a higher bar than ordinary human reasoning.
Others counter that degrees don’t equate to “smart,” and that much of human “intelligence” is accumulated cultural knowledge and search, not novel insight.
This leads to discussion of “general intelligence” as socially defined and heavily dependent on prior learning and collaboration.

Prompting skill and user experience

Frequent users report that prompt craftsmanship substantially affects output quality (e.g., linear structures, constrained outputs).
Prompting is getting both “smarter” and more tedious; there’s interest in tools that automate meta‑prompting.
Some perceive models as getting “dumber” over time, possibly due to cost‑cutting, outdated training data, or contrast with initial novelty, though this is speculative and unresolved in the thread.

Related topics