GLM-Z1-32B-0414

reasoning

GLM-Z1-32B-0414 is a specialized reasoning model with deep thinking capabilities, developed based on GLM-4-32B-0414 through cold-start initialization and extended reinforcement learning. The model has been further fine-tuned on tasks involving mathematics, programming, and logic, significantly enhancing its ability to solve complex problems. It supports a context window of up to 32K tokens, which can be extended to 128K using YaRN technology.

During training, comprehensive reinforcement learning based on pairwise ranking feedback was implemented, strengthening the model’s overall capabilities. Compared to the base model, GLM-Z1-32B-0414 demonstrates substantial improvements in mathematical reasoning and complex problem-solving. The model is capable of step-by-step problem analysis with detailed justification at each stage of reasoning.

GLM-Z1-32B-0414 excels in academic research, scientific computing, and educational applications where step-by-step explanations are required. It performs exceptionally well in mathematical proofs, algorithm analysis, logical puzzles, and high-complexity programming tasks. Its ability to engage in structured thinking makes it an indispensable tool for developing educational materials.


Announce Date: 14.04.2025
Parameters: 32B
Context: 33K
Layers: 61
Attention Type: Full Attention
Developer: Z.ai
Transformers Version: 4.52.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore GLM-Z1-32B-0414 capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting GLM-Z1-32B-0414

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-3.32.64.160
32,768.0
tensor
3 $0.88 3.408 Launch
teslaa10-2.16.64.160
32,768.0
tensor
2 $0.93 7.481 Launch
teslaa2-3.32.128.160
32,768.0
tensor
3 $1.06 3.447 Launch
rtx2080ti-4.16.32.160
32,768.0
tensor
4 $1.12 1.343 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
2 $1.23 7.481 Launch
rtx3090-2.16.64.160
32,768.0
tensor
2 $1.56 8.504 Launch
rtx5090-1.16.64.160
32,768.0
1 $1.59 2.624 Launch
rtx4090-2.16.64.160
32,768.0
tensor
2 $1.92 8.465 Launch
teslaa100-1.16.64.160
32,768.0
1 $2.37 25.742 Launch
h100-1.16.64.160
32,768.0
1 $3.83 25.708 Launch
h100nvl-1.16.96.160
32,768.0
1 $4.11 32.421 Launch
teslaa100-2.24.96.160.nvlink
32,768.0
tensor
2 $4.61 62.258 Launch
h200-1.16.128.160
32,768.0
1 $4.74 54.964 Launch
h200-2.24.256.160.nvlink
32,768.0
tensor
2 $9.40 120.704 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-2.16.64.160
32,768.0
tensor
2 $0.93 2.622 Launch
teslat4-4.16.64.160
32,768.0
tensor
4 $0.96 2.775 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
2 $1.23 2.622 Launch
teslaa2-4.32.128.160
32,768.0
tensor
4 $1.26 2.813 Launch
rtx3090-2.16.64.160
32,768.0
tensor
2 $1.56 3.645 Launch
rtx4090-2.16.64.160
32,768.0
tensor
2 $1.92 3.606 Launch
teslaa100-1.16.64.160
32,768.0
1 $2.37 20.883 Launch
rtx5090-2.16.64.160
32,768.0
tensor
2 $2.93 11.164 Launch
h100-1.16.64.160
32,768.0
1 $3.83 20.849 Launch
h100nvl-1.16.96.160
32,768.0
1 $4.11 27.562 Launch
teslaa100-2.24.96.160.nvlink
32,768.0
tensor
2 $4.61 57.399 Launch
h200-1.16.128.160
32,768.0
1 $4.74 50.106 Launch
h200-2.24.256.160.nvlink
32,768.0
tensor
2 $9.40 115.845 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-4.16.128.240
32,768.0
tensor
4 $1.76 2.345 Launch
teslaa100-1.16.64.240
32,768.0
1 $2.38 4.696 Launch
rtx3090-4.16.64.240
32,768.0
tensor
4 $2.89 3.368 Launch
rtx4090-4.16.64.240
32,768.0
tensor
4 $3.61 3.330 Launch
h100-1.16.64.240
32,768.0
1 $3.83 4.662 Launch
h100nvl-1.16.96.240
32,768.0
1 $4.12 11.375 Launch
rtx5090-3.16.96.240
32,768.0
tensor
3 $4.35 5.584 Launch
h200-1.16.128.240
32,768.0
1 $4.74 33.918 Launch
teslaa100-2.24.256.320.nvlink
32,768.0
tensor
2 $4.94 41.212 Launch
h200-2.24.256.240.nvlink
32,768.0
tensor
2 $9.41 99.658 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.