2024-06-09

A ChatGPT mistake cost us $10k

Role of ChatGPT vs. Engineering Process

Many argue the real problem was not ChatGPT but poor engineering practice: no proper review of generated code, weak testing, and missing monitoring/alerts.
Others see ChatGPT as a “single point of failure”: it produced ORM code no one on the team really understood, creating a codebase the team didn’t “own” mentally.
Some object to the title as misleading or clickbait; they reframe it as “we blindly trusted ChatGPT and lacked safeguards.”

The Actual Bug and Python/SQLAlchemy Footgun

Core issue: default=str(uuid.uuid4()) in a SQLAlchemy Column evaluates once at class definition, so each process reused the same UUID, triggering duplicate key violations.
Several note this is analogous to Python’s “mutable default argument” trap and is a common mistake even among humans.
Others point out SQLAlchemy’s API (same parameter for static value or callable) makes this error easy; suggestions include separate default vs default_factory or lints that reject static defaults on unique/PK columns.

Testing, Logging, and Monitoring Failures

Repeated criticism that:
- No tests created multiple rows in the same table in one run.
- Logs and alerts for DB constraint errors were absent or unused; this should have been a 5‑minute diagnosis from “duplicate key” errors.
- Deploying directly to production, at night, with 10–20 commits/day and no observability is called reckless.

Architecture and Stack Choices

Many question rewriting a working NextJS/TS backend to Python/FastAPI before having real traction, especially in a stack the team was weak in.
Overprovisioning (“8 ECS tasks × 5 instances” for tiny traffic) is seen as symptomatic of credit-fueled, wasteful startup culture.

LLMs in Production Code

Some treat LLMs as useful but only if their output is treated like code from a junior/intern and thoroughly understood.
Others are more pessimistic: LLMs are inherently non-deterministic “word generators,” so relying on them for business logic is seen as irresponsible.
A minority notes that the same bug could have been written by humans; the key is process (tests, review, observability), not the tool.

Meta: Postmortem and Reputation

Mixed reactions to publishing the story: some praise the honesty and see it as a useful cautionary tale; others say it harms the company’s credibility without offering deep or actionable takeaways.

Related topics