2024-07-23

Llama 3.1

Model capabilities & benchmark results

Commenters highlight the 405B model as roughly competitive with GPT‑4o on several public benchmarks (MMLU, coding, math), and near top-tier in some user-run tests (e.g., NYT Connections, coding leaderboards).
The 8B and 70B variants show notable gains over Llama 3, especially on MMLU, and are seen as more practical for most users.
Some users report that GPT‑4o and Claude 3.5 still feel better in real coding and math tasks despite benchmark parity.
Benchmarks are widely treated with caution; LMSys ELO is mentioned as more reflective of “real world” usage, but it has its own limitations.

Hardware requirements & running locally

405B is considered essentially out of reach for typical home hardware, even under 4–8 bit quantization; estimates include ~200 GB+ VRAM and multi‑GPU setups costing around $10k or more.
Suggestions include multi‑GPU PC builds, Mac Studio clusters over Thunderbolt with tools like Exo, and CPU-only options that would be extremely slow.
8B and 70B models are commonly run locally via tools like Ollama, llama.cpp, and other frontends on single GPUs or high‑RAM Macs.
There is ongoing work and some friction around support for new architectures (e.g., ROPE changes).

Hosting, pricing & ecosystem

For serious use of 405B, commenters point to cloud providers (AWS, GCP, Azure), specialized inference platforms (Groq, Hyperbolic, Bedrock), and APIs embedded in products (WhatsApp, Meta AI, Poe, VSCode extensions).
Discussion notes that open models don’t automatically mean cheap inference; hosted Llama pricing is often compared to proprietary models.

“Open source” vs “open weights” debate

Strong disagreement over calling Llama “open source.”
Critics note license restrictions (certain commercial users, military/nuclear use, acceptable‑use clauses) and the absence of training datasets, arguing this breaks with traditional open‑source and open‑science norms.
Others argue that releasing weights plus code is still a major positive, and that strict semantic policing may discourage companies from opening anything.
Several propose “open weights” or “nearly-open source” as more accurate terms.

Meta’s strategy & competitive landscape

Multiple comments frame Meta’s releases as a “scorched earth” play to undercut proprietary labs by collapsing the base‑model moat.
There is debate over whether any training “secret sauce” exists or whether compute scale plus open weights will commoditize base models, shifting profit to applications and compute.
Some see heavy use of synthetic data for fine‑tuning as a key ingredient and a broader industry trend.

Regulation & regional access

Europeans report that Meta’s chat product isn’t available in the EU, likely due to GDPR and upcoming AI/DM rules.
Opinions split between viewing this as justified consumer protection vs. evidence that EU regulation slows access to new tech.

Related topics