Kimi-K2.7-Code is an open-weight model released by Moonshot AI under a Modified MIT license, specifically optimized for agentic coding workflows in the form of long-horizon coding tasks — multi-step software engineering scenarios where the problem cannot be solved in a single pass.
Architecturally, Kimi-K2.7-Code is a Mixture-of-Experts model with 1 trillion parameters, of which 32 billion are activated per token. The model consists of 61 layers (one dense and 60 MoE layers), uses 384 experts with a selection of 8 per token and one shared expert. The attention mechanism is Multi-head Latent Attention (MLA) — the same scheme used across the entire Kimi K2 family: it compresses the KV-cache into a latent space, dramatically reducing memory usage on long contexts. The model supports a context window of 262,144 tokens. Like its predecessors, the model was developed and is served in native INT4 quantization, meaning the weights are optimized for INT4 during training. This preserves quality while requiring substantially less memory for the weights. A second key feature is native multimodality: along with text, the model accepts images and video through a built-in visual encoder, MoonViT, with 400M parameters.
K2.7-Code operates forcibly in thinking mode with the preserve_thinking flag enabled: the model always reasons step by step and retains the full reasoning content between dialogue turns. This is critical for agentic loops, where the assistant must remember its previous reasoning during multi-step tool calls — for example, which hypotheses it has already ruled out during debugging. Additionally, an Interleaved Thinking and Multi-Step Tool Call mechanism is implemented, inherited from K2-Thinking: the model alternates reasoning and tool calls within a single response, constructing chains of multiple tool calls.
Compared to the previous version, Kimi-K2.6, Kimi K2.7 Code demonstrates significant progress, not only on benchmarks. The model reduces the use of "thinking tokens" by approximately 30%, leading to faster responses in interactive sessions. Unlike the general-purpose K2.6 model, Kimi K2.7 Code is purpose-built for coding tasks, while K2.6 is recommended for general tasks such as text writing, analysis, and dialogue. Consequently, on key programming benchmarks, the model competes with leading proprietary solutions. On Kimi Code Bench v2 — K2.7 Code (62.0) is behind GPT-5.5 (69.0) and Claude Opus 4.8 (67.4) but shows a significant gap over K2.6. On Program Bench — K2.7 Code (53.6) trails GPT-5.5 (69.1) and Opus 4.8 (63.8) yet notably surpasses K2.6 (48.3). On the MCP Mark Verified benchmark, K2.7 Code (81.1) outperforms Claude Opus 4.8 (76.4), only trailing GPT-5.5 (92.9).
Kimi K2.7 Code is ideally suited for developers and engineering teams working on complex software projects: automating refactoring and codebase migrations, implementing multi-file features, debugging in extended sessions, writing code from scratch according to a technical specification, and analyzing and documenting existing code. The model is effective in agentic workflows — for example, as part of CI/CD pipelines for automatic bug fixing, in code review tools, and in systems for autonomous task completion based on specifications. Thanks to image and video support, the model can be used for analyzing visual materials accompanying technical documentation, as well as for working with interfaces and diagrams.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
262,144.0 pipeline |
6 | $28.39 | 1.468 | Launch | ||
262,144.0 tensor |
8 | $37.37 | 3.243 | Launch | ||
262,144.0 tensor |
8 | $37.37 | 3.243 | Launch | ||
There are no configurations for this model, context and quantization yet.
There are no configurations for this model, context and quantization yet.
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.