Xiaomi MiMo Reasoning Model

Early user impressions and access

  • Some tried MiMo via an unofficial Hugging Face Space; responses can be slow and chat turn-taking is buggy.
  • Qualitative feedback: “not great, not terrible” — decent code generation but struggles to fix its own mistakes over multiple rounds.
  • Others report it feels “pretty solid” but has long thinking times compared to some recent MoE models.

Benchmarks and realism

  • Several doubt the reported benchmark numbers for a 7B model; suspicion that benchmarks or closely related data were in training/RL, especially given RFT.
  • Broader sentiment that LLM benchmarks are heavily gamed, often contaminated, and poorly mapped to real-world use.
  • Others counter that small models (e.g., 4B–12B) have been quietly getting much better and that similar strong scores exist for other small models (e.g., Qwen 3 4B).

Small local models and emerging workflows

  • Many see small, fast local models as increasingly “good enough” for everyday coding, business, and productivity tasks, with privacy and cost advantages.
  • Some describe building numerous bespoke LLM-powered apps (email summarization, irrigation planner, meal planner) and preferring local models for control and safety-guardrail flexibility.
  • Trade-off noted: smaller models often require more careful problem decomposition vs. large cloud models that “just work” more often.

Open weights, licensing, and ecosystem

  • Clarification that MiMo is MIT-licensed with open weights, not fully open-source in the classic “code” sense.
  • Discussion that most major players except a few (notably Anthropic, possibly changing OpenAI) now release at least some open-weight models.

GGUF, Ollama, and deployment tips

  • GGUF builds appeared quickly on Hugging Face; users eager to run MiMo via LM Studio/Ollama.
  • Explanation of how Ollama’s Modelfile system works, how to pull GGUF models from Hugging Face, and how to override parameters without duplicating large blobs.
  • Some frustration that Ollama reintroduces multi-file complexity GGUF was designed to avoid, but acceptance that it simplifies getting started.

Language focus and data

  • Debate on why many Chinese models appear “English-first”:
    • CommonCrawl and similar corpora are English-dominated.
    • Chinese web data is fragmented into closed, app-centric platforms that are harder to crawl.
  • Counterpoints:
    • Inside China, models and usage are largely Mandarin-based; outside, English is the natural choice and needed for benchmarks.
    • Some Chinese efforts (e.g., DeepSeek, 01.ai) reportedly emphasize Chinese tokens and Chinese-first models, but those get less Western visibility.

RL, reasoning, and coding

  • Interest in MiMo’s RL-heavy design: trained from scratch with large token counts and RL for reasoning, rather than pure distillation from a larger teacher.
  • Coding eval scores are seen as very strong for a 7B, close to well-regarded mid-size proprietary models.
  • Curiosity about the RL setup for code: MiMo uses unit-test–based rewards on curated, hard-but-solvable problems with an online judge to parallelize tests.
  • Some complain the README’s “RL” label is too vague; others note the technical report provides more detail (e.g., modified GRPO) and that README-level shorthand is common.

Naming and Xiaomi context

  • Multiple folk-etymologies: “MiMo” as “Xiaomi model,” “Millet Model,” “Rice Model,” or a Chinese character abbreviation.
  • Xiaomi’s own “Little Rice” branding and a Buddhist quote about a grain of rice holding vast significance are mentioned as background flavor.