DeepSeek-V3.2-Exp

reasoning

DeepSeek-V3.2-Exp is an experimental model based on V3.1-Terminus. Architecturally, it is built on the same foundation as V3.1-Terminus: a Hybrid Reasoning Mode, a Mixture-of-Experts (MoE) with 256 experts, of which only 8 are activated per token. The model uses a context window of up to 163,840 tokens and Multi-Latent Attention (MLA) technology. The key difference in this experimental model lies in the DeepSeek Sparse Attention (DSA) mechanism—a fundamentally new approach to attention in transformers. DSA consists of two main components: a lightning indexer and a point-wise token selection mechanism. The lightning indexer quickly computes index scores between a query token and preceding tokens, identifying the most relevant elements for attention and selecting only the 2048 most suitable tokens for each query token. The point-wise selection mechanism extracts only the key-value pairs corresponding to the top-k index scores, which significantly reduces computational complexity.

Version V3.2 was released just one month after V3.1-Terminus. According to the developers, this model is an intermediate step towards a next-generation architecture. The model demonstrates excellent and stable results on leading benchmarks: AIME 2025 (89.3% accuracy)—an international mathematics Olympiad, Codeforces (rating 2121)—a competitive programming platform, the comprehensive knowledge assessment MMLU-Pro (85.0%), the agent-based task BrowseComp (40.1%) for web navigation, and SimpleQA (97.1%). However, on a number of metrics, the experimental model, though slightly, falls short of the base V3.1-Terminus version.

DeepSeek-V3.2-Exp is an experimental version; nevertheless, it is the one used in the official chat inference and the DeepSeek app. This means that the open-source version, distributed under the MIT license, will be perfectly suited for solving a wide range of tasks that require the most detailed and consistent chain of reasoning and the knowledge base of a language model of such an impressive size


Announce Date: 29.09.2025
Parameters: 685B
Experts: 256
Activated at inference: 37B
Context: 164K
Layers: 61
Attention Type: DeepSeek Sparse Attention
VRAM requirements: 334.6 GB using 4 bits quantization
Developer: DeepSeek
Transformers Version: 4.44.2
License: MIT

Public endpoint

Use our pre-built public endpoints for free to test inference and explore DeepSeek-V3.2-Exp capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU TPS Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended configurations for hosting DeepSeek-V3.2-Exp

Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslaa100-6.44.512.480.nvlink
163,840.0
44 524288 480 6 $15.37 Launch
h200-3.32.512.480
163,840.0
32 524288 480 3 $21.08 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
h200-6.52.896.960
163,840.0
52 917504 960 6 $41.82 Launch
There are no configurations for this model, context and quantization yet.

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.