NIST's DeepSeek "evaluation" is a hit piece

Overall Shape of the Debate

  • Thread centers on whether the NIST/CAISI DeepSeek report is a legitimate risk assessment or a politically driven “hit piece” against a Chinese open‑weight model.
  • Many commenters admit others are reacting to the blog post, not the actual 70‑page report; several urge people to read the report first.

Views That the Report Is Propaganda / Xenophobic

  • Critics argue the report:
    • Frames an open‑weight, self‑hostable model as a national security threat while ignoring similar issues in U.S. models.
    • Compares DeepSeek primarily to closed, frontier APIs (GPT‑5, Opus) instead of comparable open‑weight models, making cost and performance findings look skewed.
    • Treats censorship of CCP‑sensitive topics and CCP‑aligned narratives as a national‑security issue in a way they see as Sinophobic and politically motivated.
  • Some see it as part of a broader U.S. pattern: fear‑mongering about Chinese tech (Huawei, TikTok) to protect domestic incumbents and manufacture consent for confrontation.

Defenses of the NIST Report and Critiques of the Blog Post

  • Others say the report is dry, heavily footnoted, and not “demonizing”; they see the blog post as misrepresenting key claims (e.g., implying NIST alleged secret exfiltration).
  • They emphasize the main findings:
    • DeepSeek lags top U.S. models on many benchmarks.
    • For similar quality, end‑to‑end task cost can be higher despite low per‑token prices.
    • DeepSeek is far more vulnerable to hijacking/jailbreaking than both U.S. frontier and a U.S. open‑weights comparator (gpt‑oss).
    • Models advance CCP‑aligned narratives and omit or refuse some sensitive topics.
  • These commenters argue it’s reasonable for a standards body to quantify such risks, even if one disagrees with the framing or priorities.

Security, Backdoors, and Abuse Scenarios

  • Multiple people note that all LLMs are susceptible to prompt injection, hijacking, and jailbreaking; weaker models will typically be more vulnerable.
  • Some discuss more subtle threat models:
    • Training‑time backdoors (e.g., behaving securely unless a hidden trigger like a year or phrase appears).
    • Using LLMs to triage submitted code for espionage targets rather than overtly generating insecure code.
    • Indirect prompt injection via data sources and obfuscated training data poisoning.
  • Others counter that open‑weight models are easier to audit in aggregate behavior, even if inspecting raw weights isn’t straightforward.

Bias, Censorship, and Ideological Alignment

  • Several comments contrast:
    • Chinese models that hard‑censor topics like Tiananmen or criticism of the CCP.
    • U.S. models that refuse various political/NSFW topics or embed liberal‑democratic assumptions, but are not legally required to praise a ruling party.
  • Some argue any state will eventually tune models for ideological or strategic purposes; the real defense is plurality of models and user awareness, not trusting one side.

Open-Weight vs Closed and Geopolitical Context

  • Many see DeepSeek and other Chinese open‑weight models as crucial for academia, startups, and non‑U.S. regions, given U.S. labs’ high prices and strict API control.
  • There’s frustration that a rare high‑quality open‑weight release is being framed primarily as a security problem instead of a public‑goods advance.
  • Others note that “open weights” ≠ full transparency: training data, filters, and potential backdoors remain hard to inspect.

Trust in Governments, Double Standards, and Whataboutism

  • Long subthreads debate:
    • Whether distrust of the CCP without equal criticism of U.S. abuses is rational or hypocritical.
    • Whether U.S. agencies routinely act beyond legal authority, making “they can’t legally do that” arguments weak.
    • Whether the DeepSeek report is genuine security work distorted by a politicized AI agency under the current administration, versus straightforward Sinophobic propaganda.
  • Some point out that both U.S. and Chinese establishments have strong incentives to weaponize LLMs and narratives; focusing exclusively on one side’s abuses is seen as naïve.