2024-12-06

Llama-3.3-70B-Instruct

Performance and Benchmarks

Multiple commenters say Llama‑3.3‑70B‑Instruct performs on par with, or slightly better than, Llama‑3.1‑405B and close to GPT‑4o on several shared benchmarks.
Some note it does unexpectedly well on independent evals (e.g., Kagi).
Others give concrete counterexamples (e.g., D&D 5e rules) where it hallucinates; another model (Claude) did better there.
Overall sentiment: very strong free / “open weight” model, but not universally superior across tasks.

Quantization, Local Use, and Tooling

Users run 70B locally via Q3–Q5 quantization on consumer GPUs and CPUs, reporting ~2–16 tokens/s depending on hardware (4090, 3090s, M‑series Macs, older GPUs).
Q4 is widely seen as the practical sweet spot; heavier quantization degrades coherence, prompt adherence, and precision, especially for complex conditionals and chain-of-thought.
Discussion of model size vs VRAM: rough rule is parameter count ≈ GB at 8‑bit; halve for 4‑bit plus overhead for context and OS.
Many recommend smaller models (8–20B, ~32B) as “sweet spots” for 12–24GB cards.
Tools mentioned: Ollama, LM Studio, llama.cpp, Open WebUI, plus Unsloth for more efficient fine‑tuning.

Base Model and “Openness”

Some are disappointed there’s no separately released base model; others say the base is effectively the earlier 3.1‑70B.
Strong debate over terminology: several insist this is not “open source” but “open weight” freeware due to license restrictions and lack of full reproducible training pipeline/datasets.
Others argue practical rebuildability is moot at current training costs, but critics point to the importance of true open source definitions and training‑data legality.

Safety, Censorship, and Uncensoring

Interest in uncensored/unaligned variants; reports that many “decensored” models feel noticeably dumber or still refuse content.
One line of work (abliteration) is criticized as often “lobotomizing” models; straightforward fine‑tuning on uncensored data is reported to preserve quality better.
Some describe manual tricks (editing refusals mid‑response) as local workarounds.

Meta’s Strategy and Ecosystem Impact

Broad agreement that releasing strong free models pressures proprietary providers (OpenAI/Anthropic), drives prices down, and benefits developers.
Several frame this as classic “commoditize your complement”: Meta strengthens its social/ads core business while commoditizing general‑purpose AI.
Views on Meta’s motives and ethics are mixed: some see a “redemption arc” and founder‑driven ambition; others stress past privacy/ads scandals and warn about more powerful ad‑targeting and AI “slop” in feeds.

Related topics