DeepSeek-R1-0528-Qwen3-8B

DeepSeek-R1-0528-Qwen3-8B is a compact 8-billion-parameter model created by distilling knowledge and reasoning capabilities from the flagship DeepSeek-R1-0528 into the Qwen3-8B base model. The model uses an architecture identical to Qwen3-8B, but incorporates the tokenizer from DeepSeek-R1-0528, ensuring compatibility with more advanced reasoning capabilities.

It demonstrates outstanding performance, achieving 86.0% on AIME 2024 — exceeding the base Qwen3-8B by 10% and matching the performance of the much larger Qwen3-235B-Thinking. These results, along with strong benchmark scores across other evaluations, place it among the leading open-source models in its class. The model serves as a great example of a well-implemented distillation process. Reasoning chains from DeepSeek-R1-0528 have been successfully transferred into a more compact architecture, opening new possibilities for academic research and industrial development of small, specialized models. Its compact size of 8B parameters makes it accessible for deployment on less powerful hardware while maintaining high-quality reasoning abilities.

DeepSeek-R1-0528-Qwen3-8B is ideally suited for educational applications, small-scale research projects, and any scenario where a capable reasoning-style answering model is needed, but deploying large reasoning models is not feasible.


Announce Date: 28.05.2025
Parameters: 9B
Context: 132K
Layers: 36
Attention Type: Full or Sliding Window Attention
Developer: DeepSeek
Transformers Version: 4.51.0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore DeepSeek-R1-0528-Qwen3-8B capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting DeepSeek-R1-0528-Qwen3-8B

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-2.16.32.160
131,072.0
tensor
2 $0.54 0.985 Launch
teslaa2-2.16.32.160
131,072.0
tensor
2 $0.57 0.985 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.785 Launch
rtx2080ti-4.16.32.160
131,072.0
tensor
4 $1.12 1.307 Launch
teslav100-1.12.64.160
131,072.0
1 $1.20 1.124 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.785 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.785 Launch
rtx5090-1.16.64.160
131,072.0
1 $1.59 1.124 Launch
rtx3080-4.16.64.160
131,072.0
tensor
4 $1.82 1.107 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.785 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.524 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.524 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.224 Launch
h200-1.16.128.160
131,072.0
1 $4.74 6.574 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
131,072.0
pipeline
3 $0.88 1.495 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.634 Launch
teslat4-4.16.64.160
131,072.0
tensor
4 $0.96 2.156 Launch
teslaa2-3.32.128.160
131,072.0
pipeline
3 $1.06 1.495 Launch
rtx2080ti-4.16.32.160
131,072.0
tensor
4 $1.12 1.156 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.634 Launch
teslaa2-4.32.128.160
131,072.0
tensor
4 $1.26 2.156 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.634 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.634 Launch
teslav100-2.16.64.240
131,072.0
tensor
2 $2.22 2.434 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.373 Launch
rtx5090-2.16.64.160
131,072.0
tensor
2 $2.93 2.434 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.373 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.073 Launch
h200-1.16.128.160
131,072.0
1 $4.74 6.423 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
131,072.0
pipeline
3 $0.88 1.560 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.698 Launch
teslat4-4.16.64.160
131,072.0
tensor
4 $0.96 2.221 Launch
teslaa2-3.32.128.160
131,072.0
pipeline
3 $1.06 1.560 Launch
rtx2080ti-4.16.32.160
131,072.0
tensor
4 $1.12 1.221 Launch
teslav100-1.12.64.160
131,072.0
1 $1.20 1.037 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.698 Launch
teslaa2-4.32.128.160
131,072.0
tensor
4 $1.26 2.221 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.698 Launch
rtx5090-1.16.64.160
131,072.0
1 $1.59 1.037 Launch
rtx3080-4.16.64.160
131,072.0
tensor
4 $1.82 1.021 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.698 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.437 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.437 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.137 Launch
h200-1.16.128.160
131,072.0
1 $4.74 6.487 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.