2025-01-29

Exposed DeepSeek database leaking sensitive information, including chat history

Breach severity & logging practices

Commenters are struck that a “production-grade” database for a #1 app-store service was internet‑exposed with no auth, full SQL control, and plaintext logs including chat history, keys, and backend details.
Some see plaintext logs as sadly normal (except for passwords); others argue that at this scale, lack of encryption and access control is inexcusable and undermines trust.
People note this wasn’t just chat content: observability data (OpenTelemetry spans) with prompts, completions, and metadata were exposed.

DeepSeek’s maturity, “side project” narrative & funding

One camp says the incident reinforces the “side project of quants” story: impressive models, but weak experience running public, secure services.
Another pushes back hard: DeepSeek reportedly has ~130 ML staff, very large GPU fleets, and costs far beyond the advertised $5.5M training run, arguing this is a serious, well‑funded lab, not evening hobbyists.
Several distinguish: DeepSeek may be a “pet project” of its parent hedge fund founder, but the ML team is full‑time; the weakness is security/infra, not ML.

Security culture, infra mistakes & ClickHouse specifics

Many emphasize that even experienced companies (auto makers, big tech) have made similar mistakes with open databases and plain logs; this doesn’t require a “side project” explanation.
Others insist any team exposing a raw DB to the public internet without auth shows a basic ops failure.
ClickHouse contributors explain defaults: local‑only access, IP filtering, and non‑SQL “default” user; DeepSeek would have had to override several safeguards. Misconfig via Docker/Kubernetes or copied configs is suspected.

Responsible disclosure & legality

Initial comments accuse Wiz of irresponsible disclosure by publishing host/port details; later replies highlight the article itself: the issue was first disclosed privately, fixed, then published.
Some raise CFAA‑style legal concerns about probing systems without explicit permission; others cite updated DOJ policy that protects good‑faith security research.

Geopolitics, propaganda, and market impact

Some see the write‑up as part of a broader campaign by Western incumbents to tarnish a disruptive Chinese competitor; others say any rapidly popular app would draw intense scrutiny.
There’s discussion of NVIDIA’s stock drop and whether DeepSeek’s efficiency meaningfully changes GPU demand; opinions diverge between “overreaction/FUD” and “evidence of AI‑hardware bubble stress.”
Several warn about CCP data access and censorship, while others argue US tech firms and governments already collect comparable data, so moral high ground is limited.

User trust, privacy, and self‑hosting

Multiple commenters treat this as a strong argument for local models, self‑hosting, or at least never sending sensitive data (secrets, configs, personal info) to public LLM APIs.
Password reuse on such services is cautioned against; password managers and unique credentials are recommended.

Related topics