Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model

Agent Swarm & Orchestration

  • Thread is very interested in the “agent swarm” idea: up to 100 sub-agents and 1,500 tool calls, trained via RL specifically for orchestration.
  • Clarified that “tool calls” here are generic interactions, often batched in a single inference; not necessarily 1,500 external API hits.
  • Debate whether this is fundamentally new or “just” automated multi‑tool calling / multi‑LLM calls that could already be built in user code.
  • Distinction made between:
    • MoE (expert selection per token inside one model) vs
    • Agent swarms (multiple task‑level agents with different prompts/tools running in parallel and aggregated).
  • Some see it as a practical engineering hack for decomposing complex tasks and saving context, others as mostly marketing noise.

Capabilities, Benchmarks & Real‑World Quality

  • Benchmarks in the blog impress many; people hope it could replace more expensive coding models, though several say only real workflows will tell.
  • Kimi is repeatedly praised for writing quality, “human‑like” conversation, and emotional intelligence; some plan to test it on specialized EQ/mafia/social benchmarks.
  • Vision SOTA claim is challenged: at least one tester reports it underperforms Gemini 3 Pro on more demanding image‑understanding tasks (e.g., BabyVision).
  • Several note that at the top end (Claude, Gemini, GPT, Kimi) benchmark deltas may not matter much for coding; tool integration and prompts dominate.

Openness, Licensing & Business Model

  • Model is released as 1T‑param MoE (32B active) with “MIT + attribution for huge commercial users.” Some like the branding requirement more than a usage fee.
  • Strong pushback on calling this “open source”: community prefers “open weights,” noting lack of training data, code, or auditability for contamination/bias.
  • Discussion on why such an expensive model is given away: theories include mindshare, “commoditize the complement,” state‑backed strategic investment, and Android/Linux‑style market entry.

Hardware, Local Use & Economics

  • Estimated ~600GB of int4 weights; cloud suggestions range from 8× to 16× H100/H200 with high hourly cost, clearly aimed at serious infra.
  • Long subthread on “can you run this at home?”:
    • Yes, with SSD streaming, huge RAM, multi‑GPU or multi‑Mac setups; community reports 5–30 tokens/s under favorable conditions.
    • But many argue that at those speeds and hardware costs it’s not “practically” local for most users or for agentic workflows.
  • Concerns about unit economics: deep agent swarms + large MoE imply heavy compute; margins seen as challenging without subsidies.

Ecosystem, Tools & Competitive Landscape

  • Kimi Code (CLI/terminal agent) and support for an agent protocol are highlighted as useful practical tooling.
  • Several note Chinese models (Kimi, DeepSeek, GLM, Qwen, Minimax) are iterating quickly and now benchmark against top proprietary models, with strong price/performance.
  • Pointers shared to various community leaderboards and niche benchmarks (ELO battles, vision clocks, OCR, EQ, Mafia) for independent evaluation.