2025-12-02

Mistral 3 family of models released

Benchmarks and Model Positioning

Many wanted direct comparisons vs OpenAI/Anthropic/Google; others argued this is pointless marketing-wise since Mistral clearly trails frontier closed models and targets a different segment.
Mistral mostly compares against recent open‑weight models (DeepSeek, Qwen, Gemma). Some see this as an “open‑weights first” stance; others read it as evidence they’d look weak against top proprietary models.
LM Arena rankings show Mistral Large 3 behind major SOTA but within a modest Elo gap; several commenters warn Arena is style-biased and easily “optimized” via tone/emoji.
There’s broad skepticism about benchmarks in general: accusations of benchmark-gaming (especially at Google/Gemini), concern about overfitting, and repeated advice to build task‑specific internal benchmarks.

Open Weights, Privacy, and Business Incentives

Strong emphasis on demand for local hosting and data privacy, especially in Europe and regulated industries; many companies will not touch US closed models due to CLOUD Act, training reuse concerns, or compliance.
Open weights are seen as:
- A way to attract VC money and prestige.
- A base for paid fine‑tuning/custom training services.
- A “competitive floor” constraining proprietary vendors’ pricing and behavior.
Some doubt the long‑term business viability of high‑quality open models; others argue there’s “no money” in keeping them closed at Mistral’s tier.

Capabilities, Architectures, and Vision

All Ministral models reportedly support tool use; structured output is seen as mostly an inference/grammar issue rather than a deep capability gap.
The small dense models (3B/8B/14B) are widely praised on paper as SOTA for their size, especially multilingual, with one 3B vision variant running fully in-browser via WebGPU.
Mixed reactions on vision claims: some call this the first “really big” open‑weight vision model; others note prior Llama vision models and licensing differences.
The Large model appears to use a DeepSeek‑V3–like architecture; several note this with some snark but general agreement that reusing best open architectures is expected.

Real‑World Usage Reports

Multiple users report Mistral 3 Medium and small models as extremely fast, cheap, and reliable for constrained tasks (formatting, categorization, summarization, language‑learning content), outperforming GPT‑5 for them despite weaker benchmarks.
Others find Mistral “next to useless” for coding compared to Claude, Gemini, or DeepSeek, with heavy hallucinations and non‑compilable code.
Consensus: benchmarks are only a rough guide; real value depends heavily on specific workloads, prompting, and cost/latency constraints.

Europe, Funding, and Politics

Strong symbolic support for an EU‑based AI player, intertwined with confusion over “Europe vs EU” and mention of other European AI companies.
Debate over how much “European” really means when the company is funded by US VCs and hosted on US clouds, versus where taxes, data, and legal jurisdiction actually land.

Related topics