Xiaomi MiMo-v2.5 Series API Permanent Price Reduction Up to 99%

Price cut scope and mechanics

  • Headline “up to 99%” reduction mainly applies to cached input tokens; non‑cached (cache miss) reductions are smaller (some say closer to 50%).
  • Several comments note that other providers historically “overcharge” for cache hits, which are much cheaper to serve than fresh tokens.
  • Off‑peak pricing (00:00–08:00 Beijing) conveniently overlaps with North American daytime, which some see as strategically favorable for Western users.

Cost drivers and hardware

  • Explanations for low prices: cheap Chinese electricity, domestically produced GPUs/NPUs (e.g., Huawei Ascend), in‑house inference chips, cheap RAM, and heavy efficiency research.
  • Some argue US export controls pushed Chinese firms to invest in a full domestic stack, now paying off.

Competition with Western labs

  • Many see this as part of a “race to zero” in inference costs, directly undercutting US labs whose prices have recently increased.
  • Some speculate Western firms may respond via lobbying or pushing restrictions on Chinese and open‑source models.

Model quality and use cases

  • Users report MiMo 2.5/Pro and DeepSeek V4‑Flash/Pro are “good enough” for most coding and light work, though not at the level of top frontier models (Claude Opus, GPT‑5.5).
  • Opinions differ: some find DeepSeek superior to Western mid‑tier models; others see it roughly comparable to Sonnet‑class, clearly below Opus.
  • Benchmarks are viewed skeptically; repeated advice is to test with real workloads.

Adoption, trust, and geopolitics

  • Debate over whether Western enterprises will ever widely adopt Chinese models, even self‑hosted, due to trust, optics, and regulatory concerns.
  • Some worry about Chinese surveillance via AI APIs; others note similar or worse US practices and emphasize that open‑weight Chinese models can be run locally.

Market dynamics and sustainability

  • One view: Chinese labs cut prices because usage and revenue lag far behind OpenAI/Anthropic/Google; another: they are aggressively subsidizing to gain market share and data, similar to EVs.
  • Disagreement over token statistics and what they really say about global usage.

Developer experience and billing

  • Mixed reports on MiMo reliability (looping outputs, tool‑use issues) vs alternatives tuned for “agentic” workflows.
  • Token/credit plans and unit conversions are seen as confusing; some users burn a large chunk of monthly budget in a single coding session.
  • Overall trend highlighted: industry shifting from “best model wins” to “good‑enough model at lowest cost.”