DeepSeek-R1-0528-Qwen3-8B

DeepSeek-R1-0528-Qwen3-8B is a compact 8-billion-parameter model created by distilling knowledge and reasoning capabilities from the flagship DeepSeek-R1-0528 into the Qwen3-8B base model. The model uses an architecture identical to Qwen3-8B, but incorporates the tokenizer from DeepSeek-R1-0528, ensuring compatibility with more advanced reasoning capabilities.

It demonstrates outstanding performance, achieving 86.0% on AIME 2024 — exceeding the base Qwen3-8B by 10% and matching the performance of the much larger Qwen3-235B-Thinking. These results, along with strong benchmark scores across other evaluations, place it among the leading open-source models in its class. The model serves as a great example of a well-implemented distillation process. Reasoning chains from DeepSeek-R1-0528 have been successfully transferred into a more compact architecture, opening new possibilities for academic research and industrial development of small, specialized models. Its compact size of 8B parameters makes it accessible for deployment on less powerful hardware while maintaining high-quality reasoning abilities.

DeepSeek-R1-0528-Qwen3-8B is ideally suited for educational applications, small-scale research projects, and any scenario where a capable reasoning-style answering model is needed, but deploying large reasoning models is not feasible.


Announce Date: 28.05.2025
Parameters: 9B
Context: 132K
Layers: 36
Attention Type: Full or Sliding Window Attention
Developer: DeepSeek
Transformers Version: 4.51.0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore DeepSeek-R1-0528-Qwen3-8B capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting DeepSeek-R1-0528-Qwen3-8B

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
131,072.0
pipeline
3 $0.88 1.368 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.619 Launch
teslat4-4.16.64.160
131,072.0
tensor
4 $0.96 1.929 Launch
teslaa2-3.32.128.160
131,072.0
pipeline
3 $1.06 1.374 Launch
rtx2080ti-4.16.32.160
131,072.0
tensor
4 $1.12 1.111 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.619 Launch
teslaa2-4.32.128.160
131,072.0
tensor
4 $1.26 1.937 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.727 Launch
rtx5090-1.16.64.160
131,072.0
1 $1.59 1.105 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.723 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.553 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.549 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.260 Launch
teslaa100-2.24.96.160.nvlink
131,072.0
tensor
2 $4.61 7.420 Launch
h200-1.16.128.160
131,072.0
1 $4.74 6.648 Launch
h200-2.24.256.160.nvlink
131,072.0
tensor
2 $9.40 13.610 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
131,072.0
pipeline
3 $0.88 1.194 Launch
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.445 Launch
teslat4-4.16.64.160
131,072.0
tensor
4 $0.96 1.755 Launch
teslaa2-3.32.128.160
131,072.0
pipeline
3 $1.06 1.200 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.445 Launch
teslaa2-4.32.128.160
131,072.0
tensor
4 $1.26 1.763 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.553 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.549 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.379 Launch
rtx5090-2.16.64.160
131,072.0
tensor
2 $2.93 2.350 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.375 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 4.086 Launch
teslaa100-2.24.96.160.nvlink
131,072.0
tensor
2 $4.61 7.246 Launch
h200-1.16.128.160
131,072.0
1 $4.74 6.474 Launch
h200-2.24.256.160.nvlink
131,072.0
tensor
2 $9.40 13.436 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-2.16.64.160
131,072.0
tensor
2 $0.93 1.086 Launch
teslat4-4.16.64.160
131,072.0
tensor
4 $0.96 1.396 Launch
rtxa5000-2.16.64.160.nvlink
131,072.0
tensor
2 $1.23 1.086 Launch
teslaa2-4.32.128.160
131,072.0
tensor
4 $1.26 1.404 Launch
rtx3090-2.16.64.160
131,072.0
tensor
2 $1.56 1.194 Launch
rtx4090-2.16.64.160
131,072.0
tensor
2 $1.92 1.190 Launch
teslaa100-1.16.64.160
131,072.0
1 $2.37 3.020 Launch
rtx5090-2.16.64.160
131,072.0
tensor
2 $2.93 1.990 Launch
h100-1.16.64.160
131,072.0
1 $3.83 3.016 Launch
h100nvl-1.16.96.160
131,072.0
1 $4.11 3.727 Launch
teslaa100-2.24.96.160.nvlink
131,072.0
tensor
2 $4.61 6.887 Launch
h200-1.16.128.160
131,072.0
1 $4.74 6.114 Launch
h200-2.24.256.160.nvlink
131,072.0
tensor
2 $9.40 13.076 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.