2025-04-30

Xiaomi MiMo Reasoning Model

Early user impressions and access

Some tried MiMo via an unofficial Hugging Face Space; responses can be slow and chat turn-taking is buggy.
Qualitative feedback: “not great, not terrible” — decent code generation but struggles to fix its own mistakes over multiple rounds.
Others report it feels “pretty solid” but has long thinking times compared to some recent MoE models.

Benchmarks and realism

Several doubt the reported benchmark numbers for a 7B model; suspicion that benchmarks or closely related data were in training/RL, especially given RFT.
Broader sentiment that LLM benchmarks are heavily gamed, often contaminated, and poorly mapped to real-world use.
Others counter that small models (e.g., 4B–12B) have been quietly getting much better and that similar strong scores exist for other small models (e.g., Qwen 3 4B).

Small local models and emerging workflows

Many see small, fast local models as increasingly “good enough” for everyday coding, business, and productivity tasks, with privacy and cost advantages.
Some describe building numerous bespoke LLM-powered apps (email summarization, irrigation planner, meal planner) and preferring local models for control and safety-guardrail flexibility.
Trade-off noted: smaller models often require more careful problem decomposition vs. large cloud models that “just work” more often.

Open weights, licensing, and ecosystem

Clarification that MiMo is MIT-licensed with open weights, not fully open-source in the classic “code” sense.
Discussion that most major players except a few (notably Anthropic, possibly changing OpenAI) now release at least some open-weight models.

GGUF, Ollama, and deployment tips

GGUF builds appeared quickly on Hugging Face; users eager to run MiMo via LM Studio/Ollama.
Explanation of how Ollama’s Modelfile system works, how to pull GGUF models from Hugging Face, and how to override parameters without duplicating large blobs.
Some frustration that Ollama reintroduces multi-file complexity GGUF was designed to avoid, but acceptance that it simplifies getting started.

Language focus and data

Debate on why many Chinese models appear “English-first”:
- CommonCrawl and similar corpora are English-dominated.
- Chinese web data is fragmented into closed, app-centric platforms that are harder to crawl.
Counterpoints:
- Inside China, models and usage are largely Mandarin-based; outside, English is the natural choice and needed for benchmarks.
- Some Chinese efforts (e.g., DeepSeek, 01.ai) reportedly emphasize Chinese tokens and Chinese-first models, but those get less Western visibility.

RL, reasoning, and coding

Interest in MiMo’s RL-heavy design: trained from scratch with large token counts and RL for reasoning, rather than pure distillation from a larger teacher.
Coding eval scores are seen as very strong for a 7B, close to well-regarded mid-size proprietary models.
Curiosity about the RL setup for code: MiMo uses unit-test–based rewards on curated, hard-but-solvable problems with an online judge to parallelize tests.
Some complain the README’s “RL” label is too vague; others note the technical report provides more detail (e.g., modified GRPO) and that README-level shorthand is common.

Naming and Xiaomi context

Multiple folk-etymologies: “MiMo” as “Xiaomi model,” “Millet Model,” “Rice Model,” or a Chinese character abbreviation.
Xiaomi’s own “Little Rice” branding and a Buddhist quote about a grain of rice holding vast significance are mentioned as background flavor.

Related topics