2026-01-27

Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model

Agent Swarm & Orchestration

Thread is very interested in the “agent swarm” idea: up to 100 sub-agents and 1,500 tool calls, trained via RL specifically for orchestration.
Clarified that “tool calls” here are generic interactions, often batched in a single inference; not necessarily 1,500 external API hits.
Debate whether this is fundamentally new or “just” automated multi‑tool calling / multi‑LLM calls that could already be built in user code.
Distinction made between:
- MoE (expert selection per token inside one model) vs
- Agent swarms (multiple task‑level agents with different prompts/tools running in parallel and aggregated).
Some see it as a practical engineering hack for decomposing complex tasks and saving context, others as mostly marketing noise.

Capabilities, Benchmarks & Real‑World Quality

Benchmarks in the blog impress many; people hope it could replace more expensive coding models, though several say only real workflows will tell.
Kimi is repeatedly praised for writing quality, “human‑like” conversation, and emotional intelligence; some plan to test it on specialized EQ/mafia/social benchmarks.
Vision SOTA claim is challenged: at least one tester reports it underperforms Gemini 3 Pro on more demanding image‑understanding tasks (e.g., BabyVision).
Several note that at the top end (Claude, Gemini, GPT, Kimi) benchmark deltas may not matter much for coding; tool integration and prompts dominate.

Openness, Licensing & Business Model

Model is released as 1T‑param MoE (32B active) with “MIT + attribution for huge commercial users.” Some like the branding requirement more than a usage fee.
Strong pushback on calling this “open source”: community prefers “open weights,” noting lack of training data, code, or auditability for contamination/bias.
Discussion on why such an expensive model is given away: theories include mindshare, “commoditize the complement,” state‑backed strategic investment, and Android/Linux‑style market entry.

Hardware, Local Use & Economics

Estimated ~600GB of int4 weights; cloud suggestions range from 8× to 16× H100/H200 with high hourly cost, clearly aimed at serious infra.
Long subthread on “can you run this at home?”:
- Yes, with SSD streaming, huge RAM, multi‑GPU or multi‑Mac setups; community reports 5–30 tokens/s under favorable conditions.
- But many argue that at those speeds and hardware costs it’s not “practically” local for most users or for agentic workflows.
Concerns about unit economics: deep agent swarms + large MoE imply heavy compute; margins seen as challenging without subsidies.

Ecosystem, Tools & Competitive Landscape

Kimi Code (CLI/terminal agent) and support for an agent protocol are highlighted as useful practical tooling.
Several note Chinese models (Kimi, DeepSeek, GLM, Qwen, Minimax) are iterating quickly and now benchmark against top proprietary models, with strong price/performance.
Pointers shared to various community leaderboards and niche benchmarks (ELO battles, vision clocks, OCR, EQ, Mafia) for independent evaluation.

Related topics