2024-08-29

Update on Llama adoption

LLAVA and Local Tooling

Several commenters praise LLAVA (vision-capable LLaMA variant) and note it’s easy to run locally via tools like llama.cpp, Ollama, and various UIs.
Some use-cases: image description, accessibility (alt-text generation), and experimentation with multimodal models.
Cloud options (e.g., Cloudflare, Replicate) are mentioned, but many emphasize self‑hosting as straightforward.

How “Open” Is Llama? Weights, Data, and EULAs

Major debate centers on Meta calling Llama “open source” while:
- Weights are downloadable only after accepting a custom license/EULA.
- Training data and full training pipeline details are not released.
Critics compare weights to compiled binaries: useful but not “source,” so this is at best “open weights” or “source-available,” not open source.
Others argue that for most users, weights + inference/finetuning code is effectively enough, and full training reproducibility is impractical anyway.

Definitions of Open Source and Language Drift

One camp insists on OSI-style definitions: no use restrictions, full “preferred form for modification,” and clear licensing; anything else is misuse or “open-washing.”
Another camp claims “open source” for AI is still unsettled; for LLMs, weights-available-with-some-restrictions may become the de facto meaning.
There is meta‑debate on whether redefining “open source” (especially by large corporations) is akin to manipulative marketing versus natural language evolution.

Meta’s Motives and Ecosystem Strategy

Supporters highlight Meta’s large contributions to developer tooling (frameworks, infra) and argue Llama is far more open than proprietary rivals, enabling local, offline, and confidential use.
Skeptics see a strategic “dumping” move: commoditize the model layer, erode competitors’ business models, and centralize ecosystem control around Meta’s stack.

Licensing, Enforcement, and Risk

Some argue licenses are toothless because it’s hard to prove which model produced an output, especially after finetuning or merging.
Others counter that subpoenas, discovery, and leaks (employees or hackers) make willful violations risky, especially for larger entities.

Open Data, Copyright, and Fully Open Models

Several comments note that truly open models (including training data) are likely impossible under current copyright regimes.
There is frustration that copyright and proprietary datasets block transparent, fully reproducible “open AI,” and concern that this permanently handicaps genuinely open alternatives.

Regulation (California SB 1047)

Brief side discussion on SB 1047: some fear it will chill open releases and entrench only a few large, regulated players; others argue regulation can be updated and that big markets like California can dictate compliance.

Related topics