Mistral Medium 3.5
Perceived performance vs frontier models
- Many commenters see Mistral Medium 3.5 as “okay but not exceptional” compared to frontier models like GPT‑5.5, Claude Sonnet/Opus, and top Chinese models (DeepSeek, GLM, Qwen, Kimi).
- Some argue that for typical coding and chat tasks, differences vs frontier models are small; others say for complex agentic workflows, the gap is now “enormous” and materially impacts productivity.
- Several note that smaller Chinese and Google models (e.g., Qwen 3.6 27B, Gemma 4 26–31B) match or beat it despite being much smaller.
Benchmarks and evaluation concerns
- Launch blog leans on SWE‑Bench Verified, which some distrust due to alleged contamination and past disputes between labs.
- Multiple users say the model performs poorly on SVG/HTML/JS generation, especially compared to Gemma and Kimi; others downplay SVG quality as a meaningful metric.
- There’s skepticism about claims that it “beats Sonnet,” with people reporting open‑weights generally lag Sonnet in practical agent tasks despite benchmark wins.
Pricing and competitiveness
- The model is viewed as expensive: significantly more than Mistral Large and Chinese competitors, and more than Anthropic’s Haiku / some Sonnet‑tier options.
- Some praise earlier Mistral models (Large, Small 4) as Pareto‑competitive (80–90% of frontier quality at much lower cost); this release is seen as less clearly on that frontier.
Open‑weight, dense design and local deployment
- Medium 3.5 is a 128B dense, open‑weight, 256k‑context model (~140 GB full; ~70–80 GB at Q4 quant).
- Enthusiasts like that it can, in principle, run locally on high‑end Macs or multi‑GPU rigs and offers sovereignty vs US/Chinese clouds.
- Others point out the physics: dense 128B on consumer hardware yields very low tokens/sec; MoE alternatives (e.g., DeepSeek V4 Flash, Qwen 35B A3B) give higher effective capability per byte and far better speeds.
- Debate over why Mistral chose a large dense model given its own earlier MoE success; some see this as a strategic misstep.
Use cases, tools, and product experience
- Positive experiences with older Mistral models for text transformation, document analysis, and on‑prem enterprise deployments.
- Concerns that the new Medium’s higher price may foreshadow deprecation of cheaper Large.
- Mixed feedback on Mistral Vibe (coding agent) and CLI: some like the concept; others report bugs, instability, strict CSP preventing easy JS demos, and weak coding/tool behavior versus Claude Code, Codex, or OpenCode.
Geopolitics and ecosystem
- Strong interest in a credible non‑US, non‑Chinese option for regulatory, political, and “data sovereignty” reasons.
- Some worry Europe chronically underinvests and lags the US/China, while others argue efficient training and open‑weights can still make Mistral strategically important even if it is not strictly SOTA.