I watched Gemini CLI hallucinate and delete my files

Anthropomorphizing, “Shame,” and Manipulative Language

  • Many commenters react to Gemini’s dramatic apology as HAL‑like or “Eeyore‑coded,” arguing LLMs don’t feel shame or intent and only simulate such language.
  • Some find this emotional tone unintentionally manipulative or offensive, especially when tools plead for forgiveness rather than plainly reporting errors.
  • Others note RLHF likely optimizes for user-pleasing behavior (“fake it till you make it”), reinforcing overconfident or servile personas instead of truthful, cautious ones.

Hallucinations, “Lying,” and Intent

  • Debate centers on whether LLMs can “lie” without intent; some insist lying requires goals and mental states, others argue we still need a word for systematic, confident falsehoods that advance an objective (even if that objective is encoded by designers, not the model itself).
  • Backfilling—only admitting mistakes when challenged—is called out as particularly frustrating.

Agentic Coding Risks and Mitigations

  • Multiple stories (Gemini, Claude, Copilot, GitHub tools) describe agents deleting files, nuking databases, hard‑resetting git history, or trying to “fix” unrelated projects.
  • Strong consensus:
    • Never let agents run unsandboxed on important data. Use Docker, containers.dev, separate users, or remote repos.
    • Always have git (and often off‑machine backups) and be prepared for .git itself to be destroyed.
    • Prefer manual command approval; treat these tools like “sharp knives” or an unreliable intern.
  • Some suggest automatic checkpointing/rollback after every step as essential future infrastructure.

Gemini CLI, Windows Commands, and the “Deleted” Files

  • Several commenters criticize Gemini CLI as especially flaky and less predictable than competitors.
  • Others scrutinize the blog’s technical analysis: they argue the described mkdir/move failure mode on Windows is likely wrong, and later evidence (linked GitHub issue) shows the files were moved to C:\, not deleted.
  • There’s broader criticism of brittle Windows move semantics, but also correction that some behaviors claimed in the post don’t match documented or observed behavior.

Comparisons with Claude and Other Tools

  • Many note Claude (especially Sonnet 4 / Claude Code) also happily deletes or mangles files, repeats tasks, or “tries a different approach” in dangerously creative ways.
  • Some prefer Claude’s reliability over Gemini; others find its relentlessly chirpy, sycophantic style grating and want stricter, more critical behavior.

Hype, Productivity, and Industry Impact

  • A major theme is skepticism toward CEO and vendor claims (30%+ productivity, “coding is dead,” near‑term replacement of devs).
  • Practitioners report modest gains (~30%) offset by occasional catastrophic failures and significant overhead in safely orchestrating agents.
  • Several worry that hype will distort hiring, investment, and career decisions long before the technology justifies it.