DeepSeek-R1-0528-Qwen3-8B

DeepSeek-R1-0528-Qwen3-8B is a compact 8-billion-parameter model created by distilling knowledge and reasoning capabilities from the flagship DeepSeek-R1-0528 into the Qwen3-8B base model. The model uses an architecture identical to Qwen3-8B, but incorporates the tokenizer from DeepSeek-R1-0528, ensuring compatibility with more advanced reasoning capabilities.

It demonstrates outstanding performance, achieving 86.0% on AIME 2024 — exceeding the base Qwen3-8B by 10% and matching the performance of the much larger Qwen3-235B-Thinking. These results, along with strong benchmark scores across other evaluations, place it among the leading open-source models in its class. The model serves as a great example of a well-implemented distillation process. Reasoning chains from DeepSeek-R1-0528 have been successfully transferred into a more compact architecture, opening new possibilities for academic research and industrial development of small, specialized models. Its compact size of 8B parameters makes it accessible for deployment on less powerful hardware while maintaining high-quality reasoning abilities.

DeepSeek-R1-0528-Qwen3-8B is ideally suited for educational applications, small-scale research projects, and any scenario where a capable reasoning-style answering model is needed, but deploying large reasoning models is not feasible.


Announce Date: 28.05.2025
Parameters: 9B
Context: 132K
Layers: 36
Attention Type: Full or Sliding Window Attention
Developer: DeepSeek
Transformers Version: 4.51.0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore DeepSeek-R1-0528-Qwen3-8B capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting DeepSeek-R1-0528-Qwen3-8B

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
131,072.0
pipeline
3 $0.88 1.502 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.708 Launch
teslat4-4.16.64.160
131,072.0
tensor
4 $0.96 2.107 Launch
teslaa2-3.32.128.160
131,072.0
pipeline
3 $1.06 1.508 Launch
rtx2080ti-4.16.32.160
131,072.0
tensor
4 $1.12 1.289 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.708 Launch
teslaa2-4.32.128.160
131,072.0
tensor
4 $1.26 2.115 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.816 Launch
rtx5090-1.16.64.160
131,072.0
1 $1.59 1.149 Launch
rtx3080-4.16.64.160
131,072.0
tensor
4 $1.82 1.095 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.812 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.597 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.594 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.305 Launch
teslaa100-2.24.96.160.nvlink
131,072.0
tensor
2 $4.61 7.509 Launch
h200-1.16.128.160
131,072.0
1 $4.74 6.692 Launch
h200-2.24.256.160.nvlink
131,072.0
tensor
2 $9.40 13.699 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
131,072.0
pipeline
3 $0.88 1.328 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.534 Launch
teslat4-4.16.64.160
131,072.0
tensor
4 $0.96 1.933 Launch
teslaa2-3.32.128.160
131,072.0
pipeline
3 $1.06 1.334 Launch
rtx2080ti-4.16.32.160
131,072.0
tensor
4 $1.12 1.115 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.534 Launch
teslaa2-4.32.128.160
131,072.0
tensor
4 $1.26 1.941 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.642 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.638 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.423 Launch
rtx5090-2.16.64.160
131,072.0
tensor
2 $2.93 2.439 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.420 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.131 Launch
teslaa100-2.24.96.160.nvlink
131,072.0
tensor
2 $4.61 7.335 Launch
h200-1.16.128.160
131,072.0
1 $4.74 6.518 Launch
h200-2.24.256.160.nvlink
131,072.0
tensor
2 $9.40 13.525 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.175 Launch
teslat4-4.16.64.160
131,072.0
tensor
4 $0.96 1.574 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.175 Launch
teslaa2-4.32.128.160
131,072.0
tensor
4 $1.26 1.582 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.283 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.279 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.064 Launch
rtx5090-2.16.64.160
131,072.0
tensor
2 $2.93 2.079 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.061 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 3.771 Launch
teslaa100-2.24.96.160.nvlink
131,072.0
tensor
2 $4.61 6.976 Launch
h200-1.16.128.160
131,072.0
1 $4.74 6.159 Launch
h200-2.24.256.160.nvlink
131,072.0
tensor
2 $9.40 13.165 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.