Mistral Medium 3.5

Perceived performance vs frontier models

  • Many commenters see Mistral Medium 3.5 as “okay but not exceptional” compared to frontier models like GPT‑5.5, Claude Sonnet/Opus, and top Chinese models (DeepSeek, GLM, Qwen, Kimi).
  • Some argue that for typical coding and chat tasks, differences vs frontier models are small; others say for complex agentic workflows, the gap is now “enormous” and materially impacts productivity.
  • Several note that smaller Chinese and Google models (e.g., Qwen 3.6 27B, Gemma 4 26–31B) match or beat it despite being much smaller.

Benchmarks and evaluation concerns

  • Launch blog leans on SWE‑Bench Verified, which some distrust due to alleged contamination and past disputes between labs.
  • Multiple users say the model performs poorly on SVG/HTML/JS generation, especially compared to Gemma and Kimi; others downplay SVG quality as a meaningful metric.
  • There’s skepticism about claims that it “beats Sonnet,” with people reporting open‑weights generally lag Sonnet in practical agent tasks despite benchmark wins.

Pricing and competitiveness

  • The model is viewed as expensive: significantly more than Mistral Large and Chinese competitors, and more than Anthropic’s Haiku / some Sonnet‑tier options.
  • Some praise earlier Mistral models (Large, Small 4) as Pareto‑competitive (80–90% of frontier quality at much lower cost); this release is seen as less clearly on that frontier.

Open‑weight, dense design and local deployment

  • Medium 3.5 is a 128B dense, open‑weight, 256k‑context model (~140 GB full; ~70–80 GB at Q4 quant).
  • Enthusiasts like that it can, in principle, run locally on high‑end Macs or multi‑GPU rigs and offers sovereignty vs US/Chinese clouds.
  • Others point out the physics: dense 128B on consumer hardware yields very low tokens/sec; MoE alternatives (e.g., DeepSeek V4 Flash, Qwen 35B A3B) give higher effective capability per byte and far better speeds.
  • Debate over why Mistral chose a large dense model given its own earlier MoE success; some see this as a strategic misstep.

Use cases, tools, and product experience

  • Positive experiences with older Mistral models for text transformation, document analysis, and on‑prem enterprise deployments.
  • Concerns that the new Medium’s higher price may foreshadow deprecation of cheaper Large.
  • Mixed feedback on Mistral Vibe (coding agent) and CLI: some like the concept; others report bugs, instability, strict CSP preventing easy JS demos, and weak coding/tool behavior versus Claude Code, Codex, or OpenCode.

Geopolitics and ecosystem

  • Strong interest in a credible non‑US, non‑Chinese option for regulatory, political, and “data sovereignty” reasons.
  • Some worry Europe chronically underinvests and lags the US/China, while others argue efficient training and open‑weights can still make Mistral strategically important even if it is not strictly SOTA.