2024-09-25

Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

Access & Licensing Constraints

Some users can’t access meta.ai from certain US territories; site reports “not available in your country.” Others suggest trying Hugging Face or llama.com instead.
Multimodal Llama 3.2 models are explicitly not licensed to individuals or companies domiciled in the EU, which several commenters link to EU AI/data regulations.
Debate whether EU exclusion is cost/benefit compliance choice vs. deliberate pressure on regulators.

Model Lineup & Capabilities

New text-only 1B and 3B models impress many: high coherence, good instruction following, and 128K context. 1B runs on low-end hardware (e.g., Raspberry Pi 5), 3B seen as superior to earlier small models (e.g., Gemma-2-2B, Phi-3.5-mini) for some tasks.
Skepticism about how much knowledge 3B parameters can store; users report good factual recall but weak reasoning (e.g., simple weight/decimal comparisons).
Multilingual: solid German, usable in some smaller languages, but confusions and code-mixing for Greek and others.

Vision & Multimodal Performance

11B and 90B vision models: some find them “legit good” for OCR and screenshots/flowcharts; others say Qwen2-VL (7B/72B) and Molmo outperform them on visual tasks and handwriting.
Word-search and spatial puzzles remain hard; models find some left‑to‑right words but mostly fail on diagonals/complex patterns.
Some users report strong refusals and safety filters on 90B for practical image-to-HTML or similar tasks.

Local / Edge Deployment

Many run models locally via Ollama, llama.cpp, KoboldCPP, LM Studio, Jan, Open WebUI, etc.
1B/3B are popular for on-device use; people report good performance on M1/M2 Macs, modest desktops, and possibly Android/Termux, with quantization.
Vision models aren’t yet fully integrated into all common local stacks (e.g., llama.cpp/Ollama support is “coming soon”).

Function / Tool Calling & JSON

Text-only and vision models support tool/function calling for text inputs; not yet for mixed text+image prompts.
Several note that constraining decoding to valid JSON can yield reliable function-calling without extra fine-tuning.

Bias, Alignment & Training Data

Thread contains extended debate over “bias” and alignment:
- Some want transparency on curated alignment data and worry about political/religious skew and censorship.
- Others argue “bias” mostly reflects training corpora and market focus (e.g., US/English-centric); full “de-biasing” is expensive and low-ROI.
Meta is praised for technical openness and model releases, but criticized for opaque training data and default user data opt-in.

Comparisons to Other Models & Services

Qwen2/Qwen2-VL, Pixtral, Gemma 2, Phi‑3.5, Molmo, GPT‑4o, Claude 3.5, Gemini, and Mistral are repeatedly used as baselines.
Consensus: Llama 3.2 small text models are very strong at their size; vision models are competitive but not clearly best-in-class.

Reliability, Hallucinations & Evaluation

Users report both good practical results and glaring hallucinations (e.g., inventing fictional histories of software frameworks).
A hallucination leaderboard entry claims ~4–5% hallucination rates for the new vision models.
Commenters distrust cherry-picked vendor benchmarks and want fresher, multi-metric leaderboards that include newer open models.

Related topics