Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

Access & Licensing Constraints

  • Some users can’t access meta.ai from certain US territories; site reports “not available in your country.” Others suggest trying Hugging Face or llama.com instead.
  • Multimodal Llama 3.2 models are explicitly not licensed to individuals or companies domiciled in the EU, which several commenters link to EU AI/data regulations.
  • Debate whether EU exclusion is cost/benefit compliance choice vs. deliberate pressure on regulators.

Model Lineup & Capabilities

  • New text-only 1B and 3B models impress many: high coherence, good instruction following, and 128K context. 1B runs on low-end hardware (e.g., Raspberry Pi 5), 3B seen as superior to earlier small models (e.g., Gemma-2-2B, Phi-3.5-mini) for some tasks.
  • Skepticism about how much knowledge 3B parameters can store; users report good factual recall but weak reasoning (e.g., simple weight/decimal comparisons).
  • Multilingual: solid German, usable in some smaller languages, but confusions and code-mixing for Greek and others.

Vision & Multimodal Performance

  • 11B and 90B vision models: some find them “legit good” for OCR and screenshots/flowcharts; others say Qwen2-VL (7B/72B) and Molmo outperform them on visual tasks and handwriting.
  • Word-search and spatial puzzles remain hard; models find some left‑to‑right words but mostly fail on diagonals/complex patterns.
  • Some users report strong refusals and safety filters on 90B for practical image-to-HTML or similar tasks.

Local / Edge Deployment

  • Many run models locally via Ollama, llama.cpp, KoboldCPP, LM Studio, Jan, Open WebUI, etc.
  • 1B/3B are popular for on-device use; people report good performance on M1/M2 Macs, modest desktops, and possibly Android/Termux, with quantization.
  • Vision models aren’t yet fully integrated into all common local stacks (e.g., llama.cpp/Ollama support is “coming soon”).

Function / Tool Calling & JSON

  • Text-only and vision models support tool/function calling for text inputs; not yet for mixed text+image prompts.
  • Several note that constraining decoding to valid JSON can yield reliable function-calling without extra fine-tuning.

Bias, Alignment & Training Data

  • Thread contains extended debate over “bias” and alignment:
    • Some want transparency on curated alignment data and worry about political/religious skew and censorship.
    • Others argue “bias” mostly reflects training corpora and market focus (e.g., US/English-centric); full “de-biasing” is expensive and low-ROI.
  • Meta is praised for technical openness and model releases, but criticized for opaque training data and default user data opt-in.

Comparisons to Other Models & Services

  • Qwen2/Qwen2-VL, Pixtral, Gemma 2, Phi‑3.5, Molmo, GPT‑4o, Claude 3.5, Gemini, and Mistral are repeatedly used as baselines.
  • Consensus: Llama 3.2 small text models are very strong at their size; vision models are competitive but not clearly best-in-class.

Reliability, Hallucinations & Evaluation

  • Users report both good practical results and glaring hallucinations (e.g., inventing fictional histories of software frameworks).
  • A hallucination leaderboard entry claims ~4–5% hallucination rates for the new vision models.
  • Commenters distrust cherry-picked vendor benchmarks and want fresher, multi-metric leaderboards that include newer open models.