Products

Cloud servers

Cloud platform with the latest GPUs, fast onboarding, per‑second billing, and immersion cooling. Isolated resources ensure maximum performance for your project..

GPU servers

Cloud servers with modern RTX and Tesla graphics accelerators for games, rendering, streaming, working with 3D graphics, and artificial intelligence.

H200

H100 NVL

H100

RTX 5090

RTX 4090

RTX 3090

RTX 3080

A100

RTX A5000

A10

RTX 2080 Ti^EOL

A2

Tesla T4^EOL

Tesla V100^EOL

All GPU servers

CPU servers

The cloud servers with high-performance Intel Xeon Gold 2nd, 3rd and 5th generation CPU are available for 100% of the processor time.
SSD serversдо 75К IOPS NVMe serversдо 360К IOPS
All CPU servers

Immers Foundation Models

The largest catalog of vetted open‑source models with automatic configuration selection and tuning for rapid deployment. Launch private endpoints with no token fees, or use public endpoints.

GLM-5.2 Kimi-K2.7-Code NVIDIA-Nemotron-3-Ultra-550B-A55B gemma-4-12B-it MiniMax-M3 DeepSeek-V4-Pro DeepSeek-V4-Flash Qwen3.6-27B Qwen3.6-35B-A3B Kimi-K2.6 GLM-5.1 gemma-4-26B-A4B-it gemma-4-31B-it NVIDIA-Nemotron-3-Super-120B-A12B Qwen3.5-122B-A10B Qwen3.5-397B-A17B gpt-oss-120b gpt-oss-20b

All modelsfrom catalogue

Dedicated servers

Rent a physically dedicated server for a long term with a monthly payment. Configure it using modern components: Intel Xeon Gold 2nd to 5th generation processors and up to 8192 GB of RAM per server, SSD and NVMe disks for data centers.

Select a dedicated serverдо 10 GPU и 2.5M IOPS

Marketplace

Use popular and modern applications as effective tools for organizing your project. Save time with pre-configured images that already have all the necessary components installed.

Forget about manually downloading and installing the software — just deploy a virtual server with a ready-made image.
Neural networks 3D CUDA Docker / NGC For games Windows images Linux images
All pre-installed images
Features
Prices
FAQ
Contact
Login

Kimi-K2.7-Code

reasoning

multimodal

coding

Kimi-K2.7-Code is an open-weight model released by Moonshot AI under a Modified MIT license, specifically optimized for agentic coding workflows in the form of long-horizon coding tasks — multi-step software engineering scenarios where the problem cannot be solved in a single pass.

Architecturally, Kimi-K2.7-Code is a Mixture-of-Experts model with 1 trillion parameters, of which 32 billion are activated per token. The model consists of 61 layers (one dense and 60 MoE layers), uses 384 experts with a selection of 8 per token and one shared expert. The attention mechanism is Multi-head Latent Attention (MLA) — the same scheme used across the entire Kimi K2 family: it compresses the KV-cache into a latent space, dramatically reducing memory usage on long contexts. The model supports a context window of 262,144 tokens. Like its predecessors, the model was developed and is served in native INT4 quantization, meaning the weights are optimized for INT4 during training. This preserves quality while requiring substantially less memory for the weights. A second key feature is native multimodality: along with text, the model accepts images and video through a built-in visual encoder, MoonViT, with 400M parameters.

K2.7-Code operates forcibly in thinking mode with the preserve_thinking flag enabled: the model always reasons step by step and retains the full reasoning content between dialogue turns. This is critical for agentic loops, where the assistant must remember its previous reasoning during multi-step tool calls — for example, which hypotheses it has already ruled out during debugging. Additionally, an Interleaved Thinking and Multi-Step Tool Call mechanism is implemented, inherited from K2-Thinking: the model alternates reasoning and tool calls within a single response, constructing chains of multiple tool calls.

Compared to the previous version, Kimi-K2.6, Kimi K2.7 Code demonstrates significant progress, not only on benchmarks. The model reduces the use of "thinking tokens" by approximately 30%, leading to faster responses in interactive sessions. Unlike the general-purpose K2.6 model, Kimi K2.7 Code is purpose-built for coding tasks, while K2.6 is recommended for general tasks such as text writing, analysis, and dialogue. Consequently, on key programming benchmarks, the model competes with leading proprietary solutions. On Kimi Code Bench v2 — K2.7 Code (62.0) is behind GPT-5.5 (69.0) and Claude Opus 4.8 (67.4) but shows a significant gap over K2.6. On Program Bench — K2.7 Code (53.6) trails GPT-5.5 (69.1) and Opus 4.8 (63.8) yet notably surpasses K2.6 (48.3). On the MCP Mark Verified benchmark, K2.7 Code (81.1) outperforms Claude Opus 4.8 (76.4), only trailing GPT-5.5 (92.9).

Kimi K2.7 Code is ideally suited for developers and engineering teams working on complex software projects: automating refactoring and codebase migrations, implementing multi-file features, debugging in extended sessions, writing code from scratch according to a technical specification, and analyzing and documenting existing code. The model is effective in agentic workflows — for example, as part of CI/CD pipelines for automatic bug fixing, in code review tools, and in systems for autonomous task completion based on specifications. Thanks to image and video support, the model can be used for analyzing visual materials accompanying technical documentation, as well as for working with interfaces and diagrams.

Announce Date: 11.06.2026
Parameters: 2T
Experts: 384
Activated at inference: 32B
Context: 263K
Layers: 61
Attention Type: Multi-head Latent Attention
Developer: Moonshot AI
Transformers Version: 4.56.2
vLLM Version: >=0.19.1
License: MIT

Public endpoint

Use our pre-built public endpoints for free to test inference and explore Kimi-K2.7-Code capabilities. You can obtain an API access token on the token management page after registration and verification.

Model Name	Context	Type	GPU	Status	Link


        There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

maximize endpoint performance,
enable full context for long sequences,
ensure top-tier security for data processing in an isolated, dedicated environment,
use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting Kimi-K2.7-Code

Prices:

Name	GPU	Price, hour	Price, month	Max Concurrency
h200-6.52.896.960 262,144.0 pipeline	6	$28.39	$20 440.68	1.468	Launch
h200-8.52.1024.960 262,144.0 tensor	8	$37.37	$26 909.72	3.243	Launch
h200-8.52.1024.960.nvlink 262,144.0 tensor	8	$37.37	$26 909.72	3.243	Launch


        There are no configurations for this model, context and quantization yet.


        There are no configurations for this model, context and quantization yet.

Related models

Kimi-K2.5

Kimi-K2.6

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.