DeepSeek-V3.2-Exp is an experimental model based on V3.1-Terminus. Architecturally, it is built on the same foundation as V3.1-Terminus: a Hybrid Reasoning Mode, a Mixture-of-Experts (MoE) with 256 experts, of which only 8 are activated per token. The model uses a context window of up to 163,840 tokens and Multi-Latent Attention (MLA) technology. The key difference in this experimental model lies in the DeepSeek Sparse Attention (DSA) mechanism—a fundamentally new approach to attention in transformers. DSA consists of two main components: a lightning indexer and a point-wise token selection mechanism. The lightning indexer quickly computes index scores between a query token and preceding tokens, identifying the most relevant elements for attention and selecting only the 2048 most suitable tokens for each query token. The point-wise selection mechanism extracts only the key-value pairs corresponding to the top-k index scores, which significantly reduces computational complexity.
Version V3.2 was released just one month after V3.1-Terminus. According to the developers, this model is an intermediate step towards a next-generation architecture. The model demonstrates excellent and stable results on leading benchmarks: AIME 2025 (89.3% accuracy)—an international mathematics Olympiad, Codeforces (rating 2121)—a competitive programming platform, the comprehensive knowledge assessment MMLU-Pro (85.0%), the agent-based task BrowseComp (40.1%) for web navigation, and SimpleQA (97.1%). However, on a number of metrics, the experimental model, though slightly, falls short of the base V3.1-Terminus version.
DeepSeek-V3.2-Exp is an experimental version; nevertheless, it is the one used in the official chat inference and the DeepSeek app. This means that the open-source version, distributed under the MIT license, will be perfectly suited for solving a wide range of tasks that require the most detailed and consistent chain of reasoning and the knowledge base of a language model of such an impressive size
Model Name | Context | Type | GPU | TPS | Status | Link |
---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
Name | vCPU | RAM, MB | Disk, GB | GPU | |||
---|---|---|---|---|---|---|---|
163,840.0 |
44 | 524288 | 480 | 6 | $15.37 | Launch | |
163,840.0 |
32 | 524288 | 480 | 3 | $21.08 | Launch |
Name | vCPU | RAM, MB | Disk, GB | GPU | |||
---|---|---|---|---|---|---|---|
163,840.0 |
52 | 917504 | 960 | 6 | $41.82 | Launch |
There are no configurations for this model, context and quantization yet.
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.