2026-06-02

MAI-Code-1-Flash

Model positioning and capabilities

MAI-Code-1-Flash is presented as a “flash” coding model competing with Anthropic’s Claude Haiku 4.5, not Sonnet/Opus–class models.
Reported scores: ~51% on SWE-Bench Pro, ~55% on Terminal-Bench 2.0 in Microsoft’s VS Code harness.
Several commenters note that Gemma 4 26B and Qwen 3.6 35B/27B achieve similar or better scores with fewer active parameters.

Model size and architecture

Clarified to be 137B parameters total with 5B “active” parameters (Mixture-of-Experts).
Some criticize the initial emphasis on “5B” as potentially misleading, given total size and comparisons to smaller strong models.

Benchmarks and evaluation concerns

Debate over whether the model might be trained on or overfit to SWE-Bench Pro; others point to the technical report’s explicit “decontamination” section and argue this is addressed.
Some feel Microsoft cherry-picks comparisons against Haiku and omits stronger small models (Qwen, Gemma, GPT‑5.4 mini, DeepSeek, etc.).
Others caution that public benchmarks may not reflect real-world performance and that cost-per-task and latency matter as much as pass@1.

Pricing and Copilot context

Copilot documentation lists MAI-Code-1-Flash at $0.75 / $0.075 / $4.50 (input / cached / output), slightly cheaper than Haiku.
Multiple commenters are upset about Copilot’s recent shift from flat/request-based pricing to per-token billing and see this model as part of that strategy.
Some say cheaper/open models via other providers or local deployment now offer better value.

Use cases, workflows, and small vs large models

Many describe workflows where a powerful model plans/architects and smaller models execute, review, or handle narrow tasks.
Opinions diverge: some find smaller models “good enough” and significantly cheaper; others say they still spend more time fixing them than they save, especially on complex codebases.
Several see this model as potentially useful as a Haiku-class backend in multi-agent systems rather than a primary coding assistant.

Training data and openness

Microsoft claims “clean, appropriately licensed data” and filters for AI-generated content; some see this as a major differentiator, others remain skeptical without a dataset list.
Disappointment that the model is not open-weights, especially compared to prior Phi models and emerging large open models.

Website and branding feedback

Strong negative reaction to the product page’s scroll hijacking and janky UX.
Some view the MAI branding and design language as derivative and over-marketed relative to the actual performance gains.

Related topics