Mistral 3 family of models released
Benchmarks and Model Positioning
- Many wanted direct comparisons vs OpenAI/Anthropic/Google; others argued this is pointless marketing-wise since Mistral clearly trails frontier closed models and targets a different segment.
- Mistral mostly compares against recent open‑weight models (DeepSeek, Qwen, Gemma). Some see this as an “open‑weights first” stance; others read it as evidence they’d look weak against top proprietary models.
- LM Arena rankings show Mistral Large 3 behind major SOTA but within a modest Elo gap; several commenters warn Arena is style-biased and easily “optimized” via tone/emoji.
- There’s broad skepticism about benchmarks in general: accusations of benchmark-gaming (especially at Google/Gemini), concern about overfitting, and repeated advice to build task‑specific internal benchmarks.
Open Weights, Privacy, and Business Incentives
- Strong emphasis on demand for local hosting and data privacy, especially in Europe and regulated industries; many companies will not touch US closed models due to CLOUD Act, training reuse concerns, or compliance.
- Open weights are seen as:
- A way to attract VC money and prestige.
- A base for paid fine‑tuning/custom training services.
- A “competitive floor” constraining proprietary vendors’ pricing and behavior.
- Some doubt the long‑term business viability of high‑quality open models; others argue there’s “no money” in keeping them closed at Mistral’s tier.
Capabilities, Architectures, and Vision
- All Ministral models reportedly support tool use; structured output is seen as mostly an inference/grammar issue rather than a deep capability gap.
- The small dense models (3B/8B/14B) are widely praised on paper as SOTA for their size, especially multilingual, with one 3B vision variant running fully in-browser via WebGPU.
- Mixed reactions on vision claims: some call this the first “really big” open‑weight vision model; others note prior Llama vision models and licensing differences.
- The Large model appears to use a DeepSeek‑V3–like architecture; several note this with some snark but general agreement that reusing best open architectures is expected.
Real‑World Usage Reports
- Multiple users report Mistral 3 Medium and small models as extremely fast, cheap, and reliable for constrained tasks (formatting, categorization, summarization, language‑learning content), outperforming GPT‑5 for them despite weaker benchmarks.
- Others find Mistral “next to useless” for coding compared to Claude, Gemini, or DeepSeek, with heavy hallucinations and non‑compilable code.
- Consensus: benchmarks are only a rough guide; real value depends heavily on specific workloads, prompting, and cost/latency constraints.
Europe, Funding, and Politics
- Strong symbolic support for an EU‑based AI player, intertwined with confusion over “Europe vs EU” and mention of other European AI companies.
- Debate over how much “European” really means when the company is funded by US VCs and hosted on US clouds, versus where taxes, data, and legal jurisdiction actually land.