2026-02-23

Anthropic announces proof of distillation at scale by MiniMax, DeepSeek,Moonshot

Scale and Feasibility of Distillation

Key data point: ~16M Claude chat sessions (via ~24k accounts) were enough to substantially distill its behavior; commenters see this as a surprisingly low barrier and evidence that Anthropic’s moat is thin.
People infer that future “industrial-scale” distillation is practically unavoidable as long as high-end models are exposed via public APIs.
Some wonder whether this data volume could train not just alignment/formatting but parts of a base model; numbers (~0.5T tokens if long contexts) make this plausible but unclear.

IP, Scraping, and Hypocrisy

Dominant reaction: Anthropic is accused of “living by the sword, dying by the sword” — having trained on scraped/copyrighted human content, then objecting when others scrape/distill their outputs.
Many say they feel no sympathy for a lab that benefited from broad, often non-consensual data use and now wants its own outputs treated as protected IP.
Several note this tweet will likely be cited in future lawsuits as evidence Anthropic believes unauthorized use of IP meaningfully harms rights-holders.

Competition, Business Models, and “Prisoner’s Dilemma”

One camp: distillation threatens incentive to invest hundreds of millions in frontier training, pushing labs to lock down models or seek regulation — a “prisoner’s dilemma” that could slow progress.
Countercamp: Chinese and other distilled/open models have already forced US labs to improve faster and lower prices; competition is working, not breaking.
Some ask why Anthropic doesn’t release its own distilled open-weight models if it truly cares about broad access.

Geopolitics, Regulation, and National Security Framing

Many see the announcement as political messaging aimed at regulators, not customers: tying Chinese distillation to export controls, national security, and bans on “foreign AI.”
There’s discussion of emerging US bills to restrict Chinese models for government contractors, and speculation about broader domestic bans.
Others note that US labs also rely on scraping and question why Chinese labs should respect US IP when export controls try to hold them back.

Broader Debates: Safety, Environment, and IP Philosophy

Some agree that if distillation cuts energy/compute by ~100×, it is ethically preferable to repeated huge training runs.
Safety concerns surface around distilled models losing safeguards and being used as agents on the open web; others dismiss this as overblown or solvable via tooling.
A long subthread debates whether modern copyright meaningfully serves individual creators versus large corporations, with some arguing for radically weakening or abolishing IP altogether.

Related topics