MAI-Code-1-Flash

Model positioning and capabilities

  • MAI-Code-1-Flash is presented as a “flash” coding model competing with Anthropic’s Claude Haiku 4.5, not Sonnet/Opus–class models.
  • Reported scores: ~51% on SWE-Bench Pro, ~55% on Terminal-Bench 2.0 in Microsoft’s VS Code harness.
  • Several commenters note that Gemma 4 26B and Qwen 3.6 35B/27B achieve similar or better scores with fewer active parameters.

Model size and architecture

  • Clarified to be 137B parameters total with 5B “active” parameters (Mixture-of-Experts).
  • Some criticize the initial emphasis on “5B” as potentially misleading, given total size and comparisons to smaller strong models.

Benchmarks and evaluation concerns

  • Debate over whether the model might be trained on or overfit to SWE-Bench Pro; others point to the technical report’s explicit “decontamination” section and argue this is addressed.
  • Some feel Microsoft cherry-picks comparisons against Haiku and omits stronger small models (Qwen, Gemma, GPT‑5.4 mini, DeepSeek, etc.).
  • Others caution that public benchmarks may not reflect real-world performance and that cost-per-task and latency matter as much as pass@1.

Pricing and Copilot context

  • Copilot documentation lists MAI-Code-1-Flash at $0.75 / $0.075 / $4.50 (input / cached / output), slightly cheaper than Haiku.
  • Multiple commenters are upset about Copilot’s recent shift from flat/request-based pricing to per-token billing and see this model as part of that strategy.
  • Some say cheaper/open models via other providers or local deployment now offer better value.

Use cases, workflows, and small vs large models

  • Many describe workflows where a powerful model plans/architects and smaller models execute, review, or handle narrow tasks.
  • Opinions diverge: some find smaller models “good enough” and significantly cheaper; others say they still spend more time fixing them than they save, especially on complex codebases.
  • Several see this model as potentially useful as a Haiku-class backend in multi-agent systems rather than a primary coding assistant.

Training data and openness

  • Microsoft claims “clean, appropriately licensed data” and filters for AI-generated content; some see this as a major differentiator, others remain skeptical without a dataset list.
  • Disappointment that the model is not open-weights, especially compared to prior Phi models and emerging large open models.

Website and branding feedback

  • Strong negative reaction to the product page’s scroll hijacking and janky UX.
  • Some view the MAI branding and design language as derivative and over-marketed relative to the actual performance gains.