MAI-Code-1-Flash
Model positioning and capabilities
- MAI-Code-1-Flash is presented as a “flash” coding model competing with Anthropic’s Claude Haiku 4.5, not Sonnet/Opus–class models.
- Reported scores: ~51% on SWE-Bench Pro, ~55% on Terminal-Bench 2.0 in Microsoft’s VS Code harness.
- Several commenters note that Gemma 4 26B and Qwen 3.6 35B/27B achieve similar or better scores with fewer active parameters.
Model size and architecture
- Clarified to be 137B parameters total with 5B “active” parameters (Mixture-of-Experts).
- Some criticize the initial emphasis on “5B” as potentially misleading, given total size and comparisons to smaller strong models.
Benchmarks and evaluation concerns
- Debate over whether the model might be trained on or overfit to SWE-Bench Pro; others point to the technical report’s explicit “decontamination” section and argue this is addressed.
- Some feel Microsoft cherry-picks comparisons against Haiku and omits stronger small models (Qwen, Gemma, GPT‑5.4 mini, DeepSeek, etc.).
- Others caution that public benchmarks may not reflect real-world performance and that cost-per-task and latency matter as much as pass@1.
Pricing and Copilot context
- Copilot documentation lists MAI-Code-1-Flash at $0.75 / $0.075 / $4.50 (input / cached / output), slightly cheaper than Haiku.
- Multiple commenters are upset about Copilot’s recent shift from flat/request-based pricing to per-token billing and see this model as part of that strategy.
- Some say cheaper/open models via other providers or local deployment now offer better value.
Use cases, workflows, and small vs large models
- Many describe workflows where a powerful model plans/architects and smaller models execute, review, or handle narrow tasks.
- Opinions diverge: some find smaller models “good enough” and significantly cheaper; others say they still spend more time fixing them than they save, especially on complex codebases.
- Several see this model as potentially useful as a Haiku-class backend in multi-agent systems rather than a primary coding assistant.
Training data and openness
- Microsoft claims “clean, appropriately licensed data” and filters for AI-generated content; some see this as a major differentiator, others remain skeptical without a dataset list.
- Disappointment that the model is not open-weights, especially compared to prior Phi models and emerging large open models.
Website and branding feedback
- Strong negative reaction to the product page’s scroll hijacking and janky UX.
- Some view the MAI branding and design language as derivative and over-marketed relative to the actual performance gains.