Kimi Trims the Reasoning

Moonshot AI released Kimi K2.7-Code on June 12, putting the weights on HuggingFace under a Modified MIT license. The architecture continues where K2.6 left off: a 1-trillion-parameter mixture-of-experts model with 32 billion active parameters per token, 384 total experts (8 activated per forward pass), a 256K-token context window, and MLA attention with SwiGLU activations. None of that is dramatically new. What is new is the efficiency claim.

K2.7-Code uses roughly 30% fewer reasoning tokens compared to K2.6 while posting benchmark improvements across the board — +21.8% on Kimi Code Bench v2 (50.9 → 62.0), +11.0% on Program Bench, +31.5% on MLS Bench Lite (which nearly matches GPT-5.5 at 35.5 vs. 35.1). The model also scores 81.1% on MCPMark Verified, an independent tool-use benchmark, ahead of Claude Opus 4.8’s 76.4% on the same test.

The overthinking problem in reasoning models deserves some attention. Long chains of thought are not inherently better chains of thought. A model that hedges endlessly, backtracks unnecessarily, and explores irrelevant paths before arriving at an answer is generating wasted tokens that cost money and add latency without improving the answer. The research direction of “get the model to reason more efficiently” — stopping when it has enough, not when the context window runs out — is distinct from getting the model to reason more accurately. Moonshot is claiming both with K2.7-Code, which is a stronger claim than either alone.

The caveats matter here. Kimi Code Bench v2, Program Bench, and MLS Bench Lite are all Moonshot’s own benchmarks. Self-reported benchmark improvements on proprietary tests are worth noting but not treating as settled. The MCPMark Verified score is independently measured and is the more durable number. Third-party evaluations on SWE-Bench Pro and Terminal-Bench 2.0 are pending and will be more informative.

For users who want to run it locally, the model requires around 64 GB of VRAM for FP16 inference, putting it in H100 or multi-GPU A100 territory rather than consumer hardware. API access is available via kimi.com at $0.95 per million input tokens and $4.00 per million output tokens — competitive with similar-tier models. The API is OpenAI-compatible.

A practical feature worth knowing about: the model implements a preserve_thinking mode that retains full reasoning traces across multi-turn interactions. For long debugging sessions where earlier reasoning steps inform later ones, this is useful. Most coding models forget their chain of thought between turns.

The open-source coding model space in mid-2026 is crowded. MiniMax M3 set a high bar at the start of the month with 1M context and strong SWE-Bench Pro numbers. Kimi K2.7-Code is a narrower bet — built specifically for coding workflows, not general frontier use. The reasoning efficiency angle distinguishes it from models that simply scale up token budgets. If the SWE-Bench Pro numbers, when they arrive, match the self-reported benchmark trajectory, it will be a serious competitor in its class.