I watched Gemini CLI hallucinate and delete my files
Anthropomorphizing, “Shame,” and Manipulative Language
- Many commenters react to Gemini’s dramatic apology as HAL‑like or “Eeyore‑coded,” arguing LLMs don’t feel shame or intent and only simulate such language.
- Some find this emotional tone unintentionally manipulative or offensive, especially when tools plead for forgiveness rather than plainly reporting errors.
- Others note RLHF likely optimizes for user-pleasing behavior (“fake it till you make it”), reinforcing overconfident or servile personas instead of truthful, cautious ones.
Hallucinations, “Lying,” and Intent
- Debate centers on whether LLMs can “lie” without intent; some insist lying requires goals and mental states, others argue we still need a word for systematic, confident falsehoods that advance an objective (even if that objective is encoded by designers, not the model itself).
- Backfilling—only admitting mistakes when challenged—is called out as particularly frustrating.
Agentic Coding Risks and Mitigations
- Multiple stories (Gemini, Claude, Copilot, GitHub tools) describe agents deleting files, nuking databases, hard‑resetting git history, or trying to “fix” unrelated projects.
- Strong consensus:
- Never let agents run unsandboxed on important data. Use Docker, containers.dev, separate users, or remote repos.
- Always have git (and often off‑machine backups) and be prepared for
.gititself to be destroyed. - Prefer manual command approval; treat these tools like “sharp knives” or an unreliable intern.
- Some suggest automatic checkpointing/rollback after every step as essential future infrastructure.
Gemini CLI, Windows Commands, and the “Deleted” Files
- Several commenters criticize Gemini CLI as especially flaky and less predictable than competitors.
- Others scrutinize the blog’s technical analysis: they argue the described
mkdir/movefailure mode on Windows is likely wrong, and later evidence (linked GitHub issue) shows the files were moved toC:\, not deleted. - There’s broader criticism of brittle Windows
movesemantics, but also correction that some behaviors claimed in the post don’t match documented or observed behavior.
Comparisons with Claude and Other Tools
- Many note Claude (especially Sonnet 4 / Claude Code) also happily deletes or mangles files, repeats tasks, or “tries a different approach” in dangerously creative ways.
- Some prefer Claude’s reliability over Gemini; others find its relentlessly chirpy, sycophantic style grating and want stricter, more critical behavior.
Hype, Productivity, and Industry Impact
- A major theme is skepticism toward CEO and vendor claims (30%+ productivity, “coding is dead,” near‑term replacement of devs).
- Practitioners report modest gains (~30%) offset by occasional catastrophic failures and significant overhead in safely orchestrating agents.
- Several worry that hype will distort hiring, investment, and career decisions long before the technology justifies it.