2026-05-26

Xiaomi MiMo-v2.5 Series API Permanent Price Reduction Up to 99%

Original Article ↗ Hacker News Discussion ↗

Price cut scope and mechanics

Headline “up to 99%” reduction mainly applies to cached input tokens; non‑cached (cache miss) reductions are smaller (some say closer to 50%).
Several comments note that other providers historically “overcharge” for cache hits, which are much cheaper to serve than fresh tokens.
Off‑peak pricing (00:00–08:00 Beijing) conveniently overlaps with North American daytime, which some see as strategically favorable for Western users.

Cost drivers and hardware

Explanations for low prices: cheap Chinese electricity, domestically produced GPUs/NPUs (e.g., Huawei Ascend), in‑house inference chips, cheap RAM, and heavy efficiency research.
Some argue US export controls pushed Chinese firms to invest in a full domestic stack, now paying off.

Competition with Western labs

Many see this as part of a “race to zero” in inference costs, directly undercutting US labs whose prices have recently increased.
Some speculate Western firms may respond via lobbying or pushing restrictions on Chinese and open‑source models.

Model quality and use cases

Users report MiMo 2.5/Pro and DeepSeek V4‑Flash/Pro are “good enough” for most coding and light work, though not at the level of top frontier models (Claude Opus, GPT‑5.5).
Opinions differ: some find DeepSeek superior to Western mid‑tier models; others see it roughly comparable to Sonnet‑class, clearly below Opus.
Benchmarks are viewed skeptically; repeated advice is to test with real workloads.

Adoption, trust, and geopolitics

Debate over whether Western enterprises will ever widely adopt Chinese models, even self‑hosted, due to trust, optics, and regulatory concerns.
Some worry about Chinese surveillance via AI APIs; others note similar or worse US practices and emphasize that open‑weight Chinese models can be run locally.

Market dynamics and sustainability

One view: Chinese labs cut prices because usage and revenue lag far behind OpenAI/Anthropic/Google; another: they are aggressively subsidizing to gain market share and data, similar to EVs.
Disagreement over token statistics and what they really say about global usage.

Developer experience and billing

Mixed reports on MiMo reliability (looping outputs, tool‑use issues) vs alternatives tuned for “agentic” workflows.
Token/credit plans and unit conversions are seen as confusing; some users burn a large chunk of monthly budget in a single coding session.
Overall trend highlighted: industry shifting from “best model wins” to “good‑enough model at lowest cost.”