Kimi-K2.7-Code

reasoning
multimodal
coding

Kimi-K2.7-Code is an open-weight model released by Moonshot AI under a Modified MIT license, specifically optimized for agentic coding workflows in the form of long-horizon coding tasks — multi-step software engineering scenarios where the problem cannot be solved in a single pass.

Architecturally, Kimi-K2.7-Code is a Mixture-of-Experts model with 1 trillion parameters, of which 32 billion are activated per token. The model consists of 61 layers (one dense and 60 MoE layers), uses 384 experts with a selection of 8 per token and one shared expert. The attention mechanism is Multi-head Latent Attention (MLA) — the same scheme used across the entire Kimi K2 family: it compresses the KV-cache into a latent space, dramatically reducing memory usage on long contexts. The model supports a context window of 262,144 tokens. Like its predecessors, the model was developed and is served in native INT4 quantization, meaning the weights are optimized for INT4 during training. This preserves quality while requiring substantially less memory for the weights. A second key feature is native multimodality: along with text, the model accepts images and video through a built-in visual encoder, MoonViT, with 400M parameters.

K2.7-Code operates forcibly in thinking mode with the preserve_thinking flag enabled: the model always reasons step by step and retains the full reasoning content between dialogue turns. This is critical for agentic loops, where the assistant must remember its previous reasoning during multi-step tool calls — for example, which hypotheses it has already ruled out during debugging. Additionally, an Interleaved Thinking and Multi-Step Tool Call mechanism is implemented, inherited from K2-Thinking: the model alternates reasoning and tool calls within a single response, constructing chains of multiple tool calls.

Compared to the previous version, Kimi-K2.6, Kimi K2.7 Code demonstrates significant progress, not only on benchmarks. The model reduces the use of "thinking tokens" by approximately 30%, leading to faster responses in interactive sessions. Unlike the general-purpose K2.6 model, Kimi K2.7 Code is purpose-built for coding tasks, while K2.6 is recommended for general tasks such as text writing, analysis, and dialogue. Consequently, on key programming benchmarks, the model competes with leading proprietary solutions. On Kimi Code Bench v2 — K2.7 Code (62.0) is behind GPT-5.5 (69.0) and Claude Opus 4.8 (67.4) but shows a significant gap over K2.6. On Program Bench — K2.7 Code (53.6) trails GPT-5.5 (69.1) and Opus 4.8 (63.8) yet notably surpasses K2.6 (48.3). On the MCP Mark Verified benchmark, K2.7 Code (81.1) outperforms Claude Opus 4.8 (76.4), only trailing GPT-5.5 (92.9).

Kimi K2.7 Code is ideally suited for developers and engineering teams working on complex software projects: automating refactoring and codebase migrations, implementing multi-file features, debugging in extended sessions, writing code from scratch according to a technical specification, and analyzing and documenting existing code. The model is effective in agentic workflows — for example, as part of CI/CD pipelines for automatic bug fixing, in code review tools, and in systems for autonomous task completion based on specifications. Thanks to image and video support, the model can be used for analyzing visual materials accompanying technical documentation, as well as for working with interfaces and diagrams.


Announce Date: 11.06.2026
Parameters: 2T
Experts: 384
Activated at inference: 32B
Context: 263K
Layers: 61
Attention Type: Multi-head Latent Attention
Developer: Moonshot AI
Transformers Version: 4.56.2
vLLM Version: >=0.19.1
License: MIT

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Kimi-K2.7-Code capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Kimi-K2.7-Code

Prices:
Name GPU Price, hour TPS Max Concurrency
h200-6.52.896.960
262,144.0
pipeline
6 $28.39 1.468 Launch
h200-8.52.1024.960
262,144.0
tensor
8 $37.37 3.243 Launch
h200-8.52.1024.960.nvlink
262,144.0
tensor
8 $37.37 3.243 Launch
There are no configurations for this model, context and quantization yet.
There are no configurations for this model, context and quantization yet.

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.