Llama-3.3-70B-Instruct

Performance and Benchmarks

  • Multiple commenters say Llama‑3.3‑70B‑Instruct performs on par with, or slightly better than, Llama‑3.1‑405B and close to GPT‑4o on several shared benchmarks.
  • Some note it does unexpectedly well on independent evals (e.g., Kagi).
  • Others give concrete counterexamples (e.g., D&D 5e rules) where it hallucinates; another model (Claude) did better there.
  • Overall sentiment: very strong free / “open weight” model, but not universally superior across tasks.

Quantization, Local Use, and Tooling

  • Users run 70B locally via Q3–Q5 quantization on consumer GPUs and CPUs, reporting ~2–16 tokens/s depending on hardware (4090, 3090s, M‑series Macs, older GPUs).
  • Q4 is widely seen as the practical sweet spot; heavier quantization degrades coherence, prompt adherence, and precision, especially for complex conditionals and chain-of-thought.
  • Discussion of model size vs VRAM: rough rule is parameter count ≈ GB at 8‑bit; halve for 4‑bit plus overhead for context and OS.
  • Many recommend smaller models (8–20B, ~32B) as “sweet spots” for 12–24GB cards.
  • Tools mentioned: Ollama, LM Studio, llama.cpp, Open WebUI, plus Unsloth for more efficient fine‑tuning.

Base Model and “Openness”

  • Some are disappointed there’s no separately released base model; others say the base is effectively the earlier 3.1‑70B.
  • Strong debate over terminology: several insist this is not “open source” but “open weight” freeware due to license restrictions and lack of full reproducible training pipeline/datasets.
  • Others argue practical rebuildability is moot at current training costs, but critics point to the importance of true open source definitions and training‑data legality.

Safety, Censorship, and Uncensoring

  • Interest in uncensored/unaligned variants; reports that many “decensored” models feel noticeably dumber or still refuse content.
  • One line of work (abliteration) is criticized as often “lobotomizing” models; straightforward fine‑tuning on uncensored data is reported to preserve quality better.
  • Some describe manual tricks (editing refusals mid‑response) as local workarounds.

Meta’s Strategy and Ecosystem Impact

  • Broad agreement that releasing strong free models pressures proprietary providers (OpenAI/Anthropic), drives prices down, and benefits developers.
  • Several frame this as classic “commoditize your complement”: Meta strengthens its social/ads core business while commoditizing general‑purpose AI.
  • Views on Meta’s motives and ethics are mixed: some see a “redemption arc” and founder‑driven ambition; others stress past privacy/ads scandals and warn about more powerful ad‑targeting and AI “slop” in feeds.