Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model
Agent Swarm & Orchestration
- Thread is very interested in the “agent swarm” idea: up to 100 sub-agents and 1,500 tool calls, trained via RL specifically for orchestration.
- Clarified that “tool calls” here are generic interactions, often batched in a single inference; not necessarily 1,500 external API hits.
- Debate whether this is fundamentally new or “just” automated multi‑tool calling / multi‑LLM calls that could already be built in user code.
- Distinction made between:
- MoE (expert selection per token inside one model) vs
- Agent swarms (multiple task‑level agents with different prompts/tools running in parallel and aggregated).
- Some see it as a practical engineering hack for decomposing complex tasks and saving context, others as mostly marketing noise.
Capabilities, Benchmarks & Real‑World Quality
- Benchmarks in the blog impress many; people hope it could replace more expensive coding models, though several say only real workflows will tell.
- Kimi is repeatedly praised for writing quality, “human‑like” conversation, and emotional intelligence; some plan to test it on specialized EQ/mafia/social benchmarks.
- Vision SOTA claim is challenged: at least one tester reports it underperforms Gemini 3 Pro on more demanding image‑understanding tasks (e.g., BabyVision).
- Several note that at the top end (Claude, Gemini, GPT, Kimi) benchmark deltas may not matter much for coding; tool integration and prompts dominate.
Openness, Licensing & Business Model
- Model is released as 1T‑param MoE (32B active) with “MIT + attribution for huge commercial users.” Some like the branding requirement more than a usage fee.
- Strong pushback on calling this “open source”: community prefers “open weights,” noting lack of training data, code, or auditability for contamination/bias.
- Discussion on why such an expensive model is given away: theories include mindshare, “commoditize the complement,” state‑backed strategic investment, and Android/Linux‑style market entry.
Hardware, Local Use & Economics
- Estimated ~600GB of int4 weights; cloud suggestions range from 8× to 16× H100/H200 with high hourly cost, clearly aimed at serious infra.
- Long subthread on “can you run this at home?”:
- Yes, with SSD streaming, huge RAM, multi‑GPU or multi‑Mac setups; community reports 5–30 tokens/s under favorable conditions.
- But many argue that at those speeds and hardware costs it’s not “practically” local for most users or for agentic workflows.
- Concerns about unit economics: deep agent swarms + large MoE imply heavy compute; margins seen as challenging without subsidies.
Ecosystem, Tools & Competitive Landscape
- Kimi Code (CLI/terminal agent) and support for an agent protocol are highlighted as useful practical tooling.
- Several note Chinese models (Kimi, DeepSeek, GLM, Qwen, Minimax) are iterating quickly and now benchmark against top proprietary models, with strong price/performance.
- Pointers shared to various community leaderboards and niche benchmarks (ELO battles, vision clocks, OCR, EQ, Mafia) for independent evaluation.