GLM-Z1-32B-0414

reasoning

GLM-Z1-32B-0414 is a specialized reasoning model with deep thinking capabilities, developed based on GLM-4-32B-0414 through cold-start initialization and extended reinforcement learning. The model has been further fine-tuned on tasks involving mathematics, programming, and logic, significantly enhancing its ability to solve complex problems. It supports a context window of up to 32K tokens, which can be extended to 128K using YaRN technology.

During training, comprehensive reinforcement learning based on pairwise ranking feedback was implemented, strengthening the model’s overall capabilities. Compared to the base model, GLM-Z1-32B-0414 demonstrates substantial improvements in mathematical reasoning and complex problem-solving. The model is capable of step-by-step problem analysis with detailed justification at each stage of reasoning.

GLM-Z1-32B-0414 excels in academic research, scientific computing, and educational applications where step-by-step explanations are required. It performs exceptionally well in mathematical proofs, algorithm analysis, logical puzzles, and high-complexity programming tasks. Its ability to engage in structured thinking makes it an indispensable tool for developing educational materials.


Announce Date: 14.04.2025
Parameters: 32B
Context: 33K
Layers: 61
Attention Type: Full Attention
Developer: Z.ai
Transformers Version: 4.52.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore GLM-Z1-32B-0414 capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting GLM-Z1-32B-0414

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-2.16.32.160
32,768.0
tensor
2 $0.54 1.710 Launch
teslaa2-2.16.32.160
32,768.0
tensor
2 $0.57 1.710 Launch
teslaa10-2.16.64.160
32,768.0
tensor
2 $0.93 9.264 Launch
rtx2080ti-3.16.64.160
32,768.0
tensor
3 $0.95 0.871 Launch
teslav100-1.12.64.160
32,768.0
1 $1.20 3.022 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
2 $1.23 9.264 Launch
rtx3090-2.16.64.160
32,768.0
tensor
2 $1.56 9.264 Launch
rtx5090-1.16.64.160
32,768.0
1 $1.59 3.022 Launch
rtx3080-4.16.64.160
32,768.0
tensor
4 $1.82 2.864 Launch
rtx4090-2.16.64.160
32,768.0
tensor
2 $1.92 9.264 Launch
teslaa100-1.16.64.160
32,768.0
1 $2.37 25.684 Launch
h100-1.16.64.160
32,768.0
1 $3.83 25.684 Launch
h100nvl-1.16.96.160
32,768.0
1 $4.11 32.294 Launch
h200-1.16.128.160
32,768.0
1 $4.74 54.484 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
32,768.0
tensor
3 $0.88 3.094 Launch
teslaa10-2.16.64.160
32,768.0
tensor
2 $0.93 4.405 Launch
teslaa2-3.32.128.160
32,768.0
tensor
3 $1.06 3.094 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
2 $1.23 4.405 Launch
rtx3090-2.16.64.160
32,768.0
tensor
2 $1.56 4.405 Launch
rtx4090-2.16.64.160
32,768.0
tensor
2 $1.92 4.405 Launch
teslav100-2.16.64.240
32,768.0
tensor
2 $2.22 11.959 Launch
teslaa100-1.16.64.160
32,768.0
1 $2.37 20.825 Launch
rtx5090-2.16.64.160
32,768.0
tensor
2 $2.93 11.959 Launch
h100-1.16.64.160
32,768.0
1 $3.83 20.825 Launch
h100nvl-1.16.96.160
32,768.0
1 $4.11 27.435 Launch
h200-1.16.128.160
32,768.0
1 $4.74 49.625 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa2-6.32.128.160
32,768.0
tensor
6 $1.65 3.284 Launch
teslaa10-4.16.128.160
32,768.0
tensor
4 $1.75 5.907 Launch
rtxa5000-4.16.128.160.nvlink
32,768.0
tensor
4 $2.34 5.907 Launch
teslaa100-1.16.128.160
32,768.0
1 $2.50 2.287 Launch
rtx3090-4.16.96.320
32,768.0
tensor
4 $2.97 5.907 Launch
rtx4090-4.16.96.320
32,768.0
tensor
4 $3.68 5.907 Launch
teslav100-3.64.256.320
32,768.0
tensor
3 $3.89 7.218 Launch
h100-1.16.128.160
32,768.0
1 $3.95 2.287 Launch
h100nvl-1.16.96.160
32,768.0
1 $4.11 8.897 Launch
rtx5090-3.16.96.160
32,768.0
tensor
3 $4.34 7.218 Launch
h200-1.16.128.160
32,768.0
1 $4.74 31.087 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.