Xiaomi MiMo-v2.5 Series API Permanent Price Reduction Up to 99%
Price cut scope and mechanics
- Headline “up to 99%” reduction mainly applies to cached input tokens; non‑cached (cache miss) reductions are smaller (some say closer to 50%).
- Several comments note that other providers historically “overcharge” for cache hits, which are much cheaper to serve than fresh tokens.
- Off‑peak pricing (00:00–08:00 Beijing) conveniently overlaps with North American daytime, which some see as strategically favorable for Western users.
Cost drivers and hardware
- Explanations for low prices: cheap Chinese electricity, domestically produced GPUs/NPUs (e.g., Huawei Ascend), in‑house inference chips, cheap RAM, and heavy efficiency research.
- Some argue US export controls pushed Chinese firms to invest in a full domestic stack, now paying off.
Competition with Western labs
- Many see this as part of a “race to zero” in inference costs, directly undercutting US labs whose prices have recently increased.
- Some speculate Western firms may respond via lobbying or pushing restrictions on Chinese and open‑source models.
Model quality and use cases
- Users report MiMo 2.5/Pro and DeepSeek V4‑Flash/Pro are “good enough” for most coding and light work, though not at the level of top frontier models (Claude Opus, GPT‑5.5).
- Opinions differ: some find DeepSeek superior to Western mid‑tier models; others see it roughly comparable to Sonnet‑class, clearly below Opus.
- Benchmarks are viewed skeptically; repeated advice is to test with real workloads.
Adoption, trust, and geopolitics
- Debate over whether Western enterprises will ever widely adopt Chinese models, even self‑hosted, due to trust, optics, and regulatory concerns.
- Some worry about Chinese surveillance via AI APIs; others note similar or worse US practices and emphasize that open‑weight Chinese models can be run locally.
Market dynamics and sustainability
- One view: Chinese labs cut prices because usage and revenue lag far behind OpenAI/Anthropic/Google; another: they are aggressively subsidizing to gain market share and data, similar to EVs.
- Disagreement over token statistics and what they really say about global usage.
Developer experience and billing
- Mixed reports on MiMo reliability (looping outputs, tool‑use issues) vs alternatives tuned for “agentic” workflows.
- Token/credit plans and unit conversions are seen as confusing; some users burn a large chunk of monthly budget in a single coding session.
- Overall trend highlighted: industry shifting from “best model wins” to “good‑enough model at lowest cost.”