Xiaomi MiMo Reasoning Model
Early user impressions and access
- Some tried MiMo via an unofficial Hugging Face Space; responses can be slow and chat turn-taking is buggy.
- Qualitative feedback: “not great, not terrible” — decent code generation but struggles to fix its own mistakes over multiple rounds.
- Others report it feels “pretty solid” but has long thinking times compared to some recent MoE models.
Benchmarks and realism
- Several doubt the reported benchmark numbers for a 7B model; suspicion that benchmarks or closely related data were in training/RL, especially given RFT.
- Broader sentiment that LLM benchmarks are heavily gamed, often contaminated, and poorly mapped to real-world use.
- Others counter that small models (e.g., 4B–12B) have been quietly getting much better and that similar strong scores exist for other small models (e.g., Qwen 3 4B).
Small local models and emerging workflows
- Many see small, fast local models as increasingly “good enough” for everyday coding, business, and productivity tasks, with privacy and cost advantages.
- Some describe building numerous bespoke LLM-powered apps (email summarization, irrigation planner, meal planner) and preferring local models for control and safety-guardrail flexibility.
- Trade-off noted: smaller models often require more careful problem decomposition vs. large cloud models that “just work” more often.
Open weights, licensing, and ecosystem
- Clarification that MiMo is MIT-licensed with open weights, not fully open-source in the classic “code” sense.
- Discussion that most major players except a few (notably Anthropic, possibly changing OpenAI) now release at least some open-weight models.
GGUF, Ollama, and deployment tips
- GGUF builds appeared quickly on Hugging Face; users eager to run MiMo via LM Studio/Ollama.
- Explanation of how Ollama’s
Modelfilesystem works, how to pull GGUF models from Hugging Face, and how to override parameters without duplicating large blobs. - Some frustration that Ollama reintroduces multi-file complexity GGUF was designed to avoid, but acceptance that it simplifies getting started.
Language focus and data
- Debate on why many Chinese models appear “English-first”:
- CommonCrawl and similar corpora are English-dominated.
- Chinese web data is fragmented into closed, app-centric platforms that are harder to crawl.
- Counterpoints:
- Inside China, models and usage are largely Mandarin-based; outside, English is the natural choice and needed for benchmarks.
- Some Chinese efforts (e.g., DeepSeek, 01.ai) reportedly emphasize Chinese tokens and Chinese-first models, but those get less Western visibility.
RL, reasoning, and coding
- Interest in MiMo’s RL-heavy design: trained from scratch with large token counts and RL for reasoning, rather than pure distillation from a larger teacher.
- Coding eval scores are seen as very strong for a 7B, close to well-regarded mid-size proprietary models.
- Curiosity about the RL setup for code: MiMo uses unit-test–based rewards on curated, hard-but-solvable problems with an online judge to parallelize tests.
- Some complain the README’s “RL” label is too vague; others note the technical report provides more detail (e.g., modified GRPO) and that README-level shorthand is common.
Naming and Xiaomi context
- Multiple folk-etymologies: “MiMo” as “Xiaomi model,” “Millet Model,” “Rice Model,” or a Chinese character abbreviation.
- Xiaomi’s own “Little Rice” branding and a Buddhist quote about a grain of rice holding vast significance are mentioned as background flavor.