Llama 3.2: Revolutionizing edge AI and vision with open, customizable models
Access & Licensing Constraints
- Some users can’t access meta.ai from certain US territories; site reports “not available in your country.” Others suggest trying Hugging Face or llama.com instead.
- Multimodal Llama 3.2 models are explicitly not licensed to individuals or companies domiciled in the EU, which several commenters link to EU AI/data regulations.
- Debate whether EU exclusion is cost/benefit compliance choice vs. deliberate pressure on regulators.
Model Lineup & Capabilities
- New text-only 1B and 3B models impress many: high coherence, good instruction following, and 128K context. 1B runs on low-end hardware (e.g., Raspberry Pi 5), 3B seen as superior to earlier small models (e.g., Gemma-2-2B, Phi-3.5-mini) for some tasks.
- Skepticism about how much knowledge 3B parameters can store; users report good factual recall but weak reasoning (e.g., simple weight/decimal comparisons).
- Multilingual: solid German, usable in some smaller languages, but confusions and code-mixing for Greek and others.
Vision & Multimodal Performance
- 11B and 90B vision models: some find them “legit good” for OCR and screenshots/flowcharts; others say Qwen2-VL (7B/72B) and Molmo outperform them on visual tasks and handwriting.
- Word-search and spatial puzzles remain hard; models find some left‑to‑right words but mostly fail on diagonals/complex patterns.
- Some users report strong refusals and safety filters on 90B for practical image-to-HTML or similar tasks.
Local / Edge Deployment
- Many run models locally via Ollama, llama.cpp, KoboldCPP, LM Studio, Jan, Open WebUI, etc.
- 1B/3B are popular for on-device use; people report good performance on M1/M2 Macs, modest desktops, and possibly Android/Termux, with quantization.
- Vision models aren’t yet fully integrated into all common local stacks (e.g., llama.cpp/Ollama support is “coming soon”).
Function / Tool Calling & JSON
- Text-only and vision models support tool/function calling for text inputs; not yet for mixed text+image prompts.
- Several note that constraining decoding to valid JSON can yield reliable function-calling without extra fine-tuning.
Bias, Alignment & Training Data
- Thread contains extended debate over “bias” and alignment:
- Some want transparency on curated alignment data and worry about political/religious skew and censorship.
- Others argue “bias” mostly reflects training corpora and market focus (e.g., US/English-centric); full “de-biasing” is expensive and low-ROI.
- Meta is praised for technical openness and model releases, but criticized for opaque training data and default user data opt-in.
Comparisons to Other Models & Services
- Qwen2/Qwen2-VL, Pixtral, Gemma 2, Phi‑3.5, Molmo, GPT‑4o, Claude 3.5, Gemini, and Mistral are repeatedly used as baselines.
- Consensus: Llama 3.2 small text models are very strong at their size; vision models are competitive but not clearly best-in-class.
Reliability, Hallucinations & Evaluation
- Users report both good practical results and glaring hallucinations (e.g., inventing fictional histories of software frameworks).
- A hallucination leaderboard entry claims ~4–5% hallucination rates for the new vision models.
- Commenters distrust cherry-picked vendor benchmarks and want fresher, multi-metric leaderboards that include newer open models.