Llama-3.3-70B-Instruct
Performance and Benchmarks
- Multiple commenters say Llama‑3.3‑70B‑Instruct performs on par with, or slightly better than, Llama‑3.1‑405B and close to GPT‑4o on several shared benchmarks.
- Some note it does unexpectedly well on independent evals (e.g., Kagi).
- Others give concrete counterexamples (e.g., D&D 5e rules) where it hallucinates; another model (Claude) did better there.
- Overall sentiment: very strong free / “open weight” model, but not universally superior across tasks.
Quantization, Local Use, and Tooling
- Users run 70B locally via Q3–Q5 quantization on consumer GPUs and CPUs, reporting ~2–16 tokens/s depending on hardware (4090, 3090s, M‑series Macs, older GPUs).
- Q4 is widely seen as the practical sweet spot; heavier quantization degrades coherence, prompt adherence, and precision, especially for complex conditionals and chain-of-thought.
- Discussion of model size vs VRAM: rough rule is parameter count ≈ GB at 8‑bit; halve for 4‑bit plus overhead for context and OS.
- Many recommend smaller models (8–20B, ~32B) as “sweet spots” for 12–24GB cards.
- Tools mentioned: Ollama, LM Studio, llama.cpp, Open WebUI, plus Unsloth for more efficient fine‑tuning.
Base Model and “Openness”
- Some are disappointed there’s no separately released base model; others say the base is effectively the earlier 3.1‑70B.
- Strong debate over terminology: several insist this is not “open source” but “open weight” freeware due to license restrictions and lack of full reproducible training pipeline/datasets.
- Others argue practical rebuildability is moot at current training costs, but critics point to the importance of true open source definitions and training‑data legality.
Safety, Censorship, and Uncensoring
- Interest in uncensored/unaligned variants; reports that many “decensored” models feel noticeably dumber or still refuse content.
- One line of work (abliteration) is criticized as often “lobotomizing” models; straightforward fine‑tuning on uncensored data is reported to preserve quality better.
- Some describe manual tricks (editing refusals mid‑response) as local workarounds.
Meta’s Strategy and Ecosystem Impact
- Broad agreement that releasing strong free models pressures proprietary providers (OpenAI/Anthropic), drives prices down, and benefits developers.
- Several frame this as classic “commoditize your complement”: Meta strengthens its social/ads core business while commoditizing general‑purpose AI.
- Views on Meta’s motives and ethics are mixed: some see a “redemption arc” and founder‑driven ambition; others stress past privacy/ads scandals and warn about more powerful ad‑targeting and AI “slop” in feeds.