Mistral Small 3
Position in the AI landscape
- Seen as Mistral’s move to stay relevant against OpenAI, DeepSeek, Qwen, Llama, etc., with some saying their earlier models fell behind Llama.
- Several comments compare it to GPT‑4o‑mini; some say performance is “on par or better,” others dismiss that tier as only good for chatty “fun” use.
- Google’s Gemini line is repeatedly brought up as a quiet but very strong competitor; some claim Gemini 2.0 / exp models are now leading, others report regressions on long-context comprehension.
Model size, performance & hardware
- 24B parameters hits a “sweet spot” for local use: fits (when quantized) on RTX 4090 / high‑RAM Macs and some 24GB cards.
- Reported speeds (quantized): ~14 tok/s on M2 Max 64GB, ~16 tok/s on 4090 laptop, ~20 tok/s on 7900 XTX, lower on M1 Pro.
- Discussion on VRAM vs system RAM: many can’t fit larger models; some would accept slower inference if it allowed bigger models, but others emphasize memory bandwidth as the real bottleneck.
Training choices & synthetic data
- Mistral states no RL and no synthetic data; some find the lack of synthetic data “strange,” others note complaints about synthetic‑heavy models overfitting to STEM and struggling with fuzzier tasks.
- People speculate about later RL-style reasoning finetunes (à la DeepSeek) on top of this base.
Licensing, “open source” and copyright
- Announcement that general‑purpose models are moving back to Apache 2.0 is welcomed as a big win for local and commercial use.
- Thread stresses this applies to weights; training code and datasets remain closed.
- Long debate over whether model weights are copyrightable, and whether calling such releases “open source” is misleading:
- One side: weights-only releases are akin to binaries; should be called “open weights,” not FOSS.
- Other side: open weights are already hugely valuable (self‑hosting, fine‑tuning, commercialization) even without full data pipelines.
Use cases for “small” models
- Suggested uses: local assistants, automated workflows, RAG, classification/tagging, ETL entity extraction, sentiment/feedback analysis, fraud detection, triage, on‑device control, coding assistance, structured JSON/tool calling.
- Several practitioners say recent instruction-following improvements make small LLMs viable for many classification and extraction tasks, often after prompt tuning and benchmarking vs traditional ML.
Benchmarks & evaluations
- One external evaluation on the MATH (hard) benchmark reports ~45% accuracy with multi‑sampling.
- Users informally compare it favorably against Qwen 2.5 32B and some earlier Mistral / local models, especially for code and local knowledge tasks.