He asked AI to count carbs 27000 times. It couldn't give the same answer twice

Task feasibility: carbs from photos

  • Many argue the problem is fundamentally under‑specified: photons don’t reveal hidden ingredients, portion density, added oils, or internal contents.
  • Examples: identical-looking foods can differ massively in calories; even carbs can vary with bread type, fillers, sauces.
  • Others counter that for carbs specifically, typical foods (e.g., a plain white-bread cheese sandwich) allow a rough, consistent human estimate using prior knowledge.

LLM behavior: randomness and limitations

  • Commenters note LLMs are probabilistic next-token predictors; repeated queries yield varied answers, even at low temperature.
  • Some highlight that even with temperature near 0 and same prompt, hardware, backend changes, and model design can still cause nondeterminism.
  • Models also struggle to quantify their own confidence; numeric “confidence scores” often don’t reflect actual uncertainty.

Medical and ethical concerns

  • Strong agreement that using generic LLMs as autonomous insulin-dosing calculators is dangerous.
  • AI carb-counting features in diabetes tools and commercial apps are seen as potentially harmful or fraudulent, especially when marketed as accurate.
  • Several insist the correct response from an AI here should be “I can’t tell” or a clearly caveated rough range, not a precise-seeming guess.

Critiques of the study and article

  • Some see the result as “water is wet”: obvious to anyone technical; they view the article as clickbaity or shallow.
  • Others defend it as an important quantitative demonstration for non-technical diabetics and policymakers, especially since prompts were taken from real insulin-related software.
  • A few say the more interesting baseline would be human estimates or existing commercial apps, not just raw frontier models.

Better approaches and practical use cases

  • Suggested improvements: include text descriptions, approximate weights, labels, barcodes, or Bluetooth scales; use specialized vision models plus nutrition databases.
  • Some report success using LLMs to log food when they provide exact ingredients and weights, with AI mainly doing aggregation and lookups.
  • Consensus: image-only carb estimation should be treated as rough guidance at best, not as a medical-grade input.

Public understanding and AI marketing

  • Many blame aggressive “AI can do anything” marketing and sci‑fi imagery for users treating LLMs as oracles.
  • Calls for better AI literacy in schools and clearer vendor messaging about limitations and appropriate use cases.