2024-09-18

Meta AI: "The Future of AI Is Open Source and Decentralized"

What “open” means for AI models

Many argue Meta’s models are “open weights,” not open source, due to restrictive licenses and closed training data.
Some see this as “openwashing”: leveraging the positive image of open source while retaining control and offloading liability.
Others counter that releasing weights plus tooling is practically close to source, since they can be fine‑tuned and extended without reverse‑engineering.

Centralized training vs. decentralized use

Training is seen as inherently centralized: requires huge capital, compute, data cleaning, and RLHF budgets that most open communities can’t match.
Inference and fine‑tuning can be decentralized on consumer or rented hardware; this is viewed as “centralized production, decentralized consumption.”
Several note that even if open methods make training 100× cheaper, large closed players can just scale up further and retain an edge.

Compute, data, and hardware constraints

Disagreement over whether compute or data is the main bottleneck; many say compute cost and availability are #1.
Datasets like FineWeb and synthetic data from existing models help, but still cost money.
Hardware scarcity and pricing (Nvidia vs AMD MI300X, VRAM limits, interconnects) are seen as barriers that favor large players.
Concern that high training and inference costs may let “giants eat small software,” challenging the classic open‑source model.

Motives and strategy of Meta

Widespread skepticism that Meta’s stance is principled; many see it as:
- A way to commoditize AI (the complement to their ad/content business).
- A competitive move to cap the advantage of stronger players.
- A talent magnet for researchers who want to publish and work on “open” models.
Some note Meta’s long history of releasing ML infrastructure (e.g., frameworks and vision models), arguing this is consistent behavior.

Privacy, data use, and liability

Intense criticism of Meta’s use of user data for AI training, opt‑out friction, and attempts to broaden legal permissions, especially under GDPR.
Debate over whether current AI teams actually have access to user data vs. just preparing legal groundwork to get it.
On copyright and harmful content, some say liability should rest with deployers (like tools or crayons); others argue that if a model is effectively a compressed copy of infringing data, creators and hosts also bear responsibility.
Concern that open‑weight releases shift safety and legal burdens (CSAM, misuse, copyright) onto smaller developers who lack resources.

Decentralization schemes and future outlook

Ideas like BOINC‑style training and crypto‑incentivized networks (e.g., Bittensor) are mentioned; bandwidth and coordination limits are seen as unsolved.
Some are cautiously optimistic that costs will drop and models will shrink or specialize, enabling more distributed innovation.
Others remain pessimistic, viewing Meta’s messaging as another iteration of “embrace, extend, extinguish” and warning of future “enshittification.”

Related topics