2024-05-22

Google AI recommends adding Elmer's glue to pizza cheese after scanning Reddit

Why the “glue pizza” answer appeared

Many see it as the obvious result of LLMs trained on huge internet text: the model likely reproduced or blended a Reddit joke about adding glue.
Some argue it’s simple next-token prediction with minor human/RLHF tweaks, not real reasoning or understanding of safety, truth, or satire.
Others note it could also be an AI “summary” of web hits, but because citations weren’t shown, that behavior is viewed as opaque and questionable.

Limits of LLMs and what “intelligence” means

One camp insists LLMs are sophisticated autocomplete engines with no real intelligence or reasoning, just pattern-matching.
Another points out they score well on human-designed intelligence tests and professional exams, questioning how that can be “not intelligence.”
Counterarguments: tests were designed for humans, models may have seen them in training, and they still fail in distinctly non-human, bizarre ways (e.g., glue on pizza, simple riddles).

User experiences and quality variability

Some report good results for simple cooking help and adapting existing recipes, but poor performance for more complex or novel tasks.
Several users describe recent models (especially a newer one) as more verbose, more confident, and more wrong, especially in structured tasks like chess analysis.
Others note alternative models appear more cautious and give sensible answers on the pizza question.

Safety, reliability, and deployment concerns

Many are alarmed that such systems are being pushed into healthcare, law, compliance, and search, where errors matter.
Others compare LLM fallibility to Google or Wikipedia being wrong sometimes, arguing they should be used as starting points, not authorities.
There’s concern that non-technical users may overtrust systems marketed as “intelligent,” despite disclaimers.

Search, data, and citations

Several criticize Google’s AI search integration as a “flat-out disaster,” with the glue and “eat rocks” examples illustrating failure to detect jokes or satire.
Some propose an ideal AI that always cites precise sources; responders explain why that’s hard with end-to-end training, but note that RAG-based systems and specialized products can partially achieve this, with caveats about reliability.

Broader reflections and humor

Discussions touch on markets rewarding companies despite degraded core products, hype cycles, and parallels to crypto bubbles.
Multiple comments emphasize that humans still must apply critical thinking; AI, blogs, and social media all mix truth, error, and jokes.

Related topics