2026-04-29

He asked AI to count carbs 27000 times. It couldn't give the same answer twice

Original Article ↗ Hacker News Discussion ↗

Task feasibility: carbs from photos

Many argue the problem is fundamentally under‑specified: photons don’t reveal hidden ingredients, portion density, added oils, or internal contents.
Examples: identical-looking foods can differ massively in calories; even carbs can vary with bread type, fillers, sauces.
Others counter that for carbs specifically, typical foods (e.g., a plain white-bread cheese sandwich) allow a rough, consistent human estimate using prior knowledge.

LLM behavior: randomness and limitations

Commenters note LLMs are probabilistic next-token predictors; repeated queries yield varied answers, even at low temperature.
Some highlight that even with temperature near 0 and same prompt, hardware, backend changes, and model design can still cause nondeterminism.
Models also struggle to quantify their own confidence; numeric “confidence scores” often don’t reflect actual uncertainty.

Medical and ethical concerns

Strong agreement that using generic LLMs as autonomous insulin-dosing calculators is dangerous.
AI carb-counting features in diabetes tools and commercial apps are seen as potentially harmful or fraudulent, especially when marketed as accurate.
Several insist the correct response from an AI here should be “I can’t tell” or a clearly caveated rough range, not a precise-seeming guess.

Critiques of the study and article

Some see the result as “water is wet”: obvious to anyone technical; they view the article as clickbaity or shallow.
Others defend it as an important quantitative demonstration for non-technical diabetics and policymakers, especially since prompts were taken from real insulin-related software.
A few say the more interesting baseline would be human estimates or existing commercial apps, not just raw frontier models.

Better approaches and practical use cases

Suggested improvements: include text descriptions, approximate weights, labels, barcodes, or Bluetooth scales; use specialized vision models plus nutrition databases.
Some report success using LLMs to log food when they provide exact ingredients and weights, with AI mainly doing aggregation and lookups.
Consensus: image-only carb estimation should be treated as rough guidance at best, not as a medical-grade input.

Public understanding and AI marketing

Many blame aggressive “AI can do anything” marketing and sci‑fi imagery for users treating LLMs as oracles.
Calls for better AI literacy in schools and clearer vendor messaging about limitations and appropriate use cases.