GLM-4-32B-0414

GLM-4-32B-Base-0414 is a base model with 32 billion parameters from the new GLM-4–0414 series developed by Team GLM. It was trained on 15TB of high-quality data, including a significant amount of synthetic materials focused on logical reasoning. This training enables a solid foundation for subsequent reinforcement learning and adaptation to user preferences.

The model was developed under the "all tools" concept, allowing it to efficiently interact with external resources such as Python, web search, user APIs, and other services. Thanks to this capability, it demonstrates excellent performance in handling complex agent-like tasks, including code generation, function calling, information retrieval, and report creation.

Its performance is comparable to industry leaders like GPT-4o and DeepSeek-V3-0324 (671B), especially in programming tasks. The model is capable of generating over 500 lines of functional code in various programming languages without additional prompts. It supports a context length of up to 128K tokens (using YaRN, with a base of 32K) and offers convenient local deployment, making it a versatile solution for enterprise applications where result predictability and stability are critical.


Announce Date: 14.04.2025
Parameters: 32B
Context: 33K
Layers: 61
Attention Type: Full Attention
Developer: Z.ai
Transformers Version: 4.52.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore GLM-4-32B-0414 capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting GLM-4-32B-0414

Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa2-2.16.32.160
32,768.0
tensor
2 $0.57 14.050 1.016 Launch
teslat4-3.32.64.160
32,768.0
tensor
3 $0.88 4.182 Launch
teslaa10-2.16.64.160
32,768.0
tensor
2 $0.93 8.642 Launch
rtx2080ti-4.16.32.160
32,768.0
tensor
4 $1.12 1.924 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
2 $1.23 8.642 Launch
rtx3090-2.16.64.160
32,768.0
tensor
2 $1.56 9.665 Launch
rtx5090-1.16.64.160
32,768.0
1 $1.59 3.785 Launch
rtx3080-4.16.64.160
32,768.0
tensor
4 $1.82 1.007 Launch
rtx4090-2.16.64.160
32,768.0
tensor
2 $1.92 9.626 Launch
teslaa100-1.16.64.160
32,768.0
1 $2.37 26.903 Launch
h100-1.16.64.160
32,768.0
1 $3.83 26.869 Launch
h100nvl-1.16.96.160
32,768.0
1 $4.11 33.582 Launch
teslaa100-2.24.96.160.nvlink
32,768.0
tensor
2 $4.61 63.419 Launch
h200-1.16.128.160
32,768.0
1 $4.74 56.125 Launch
h200-2.24.256.160.nvlink
32,768.0
tensor
2 $9.40 121.865 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-2.16.64.160
32,768.0
tensor
2 $0.93 2.622 Launch
teslat4-4.16.64.160
32,768.0
tensor
4 $0.96 2.775 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
2 $1.23 2.622 Launch
teslaa2-4.32.128.160
32,768.0
tensor
4 $1.26 2.813 Launch
rtx3090-2.16.64.160
32,768.0
tensor
2 $1.56 3.645 Launch
rtx4090-2.16.64.160
32,768.0
tensor
2 $1.92 3.606 Launch
teslaa100-1.16.64.160
32,768.0
1 $2.37 20.883 Launch
rtx5090-2.16.64.160
32,768.0
tensor
2 $2.93 11.164 Launch
h100-1.16.64.160
32,768.0
1 $3.83 20.849 Launch
h100nvl-1.16.96.160
32,768.0
1 $4.11 27.562 Launch
teslaa100-2.24.96.160.nvlink
32,768.0
tensor
2 $4.61 57.399 Launch
h200-1.16.128.160
32,768.0
1 $4.74 50.106 Launch
h200-2.24.256.160.nvlink
32,768.0
tensor
2 $9.40 115.845 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-4.16.128.240
32,768.0
tensor
4 $1.76 2.345 Launch
teslaa100-1.16.64.240
32,768.0
1 $2.38 4.696 Launch
rtx3090-4.16.64.240
32,768.0
tensor
4 $2.89 3.368 Launch
rtx4090-4.16.64.240
32,768.0
tensor
4 $3.61 3.330 Launch
h100-1.16.64.240
32,768.0
1 $3.83 4.662 Launch
h100nvl-1.16.96.240
32,768.0
1 $4.12 11.375 Launch
rtx5090-3.16.96.240
32,768.0
tensor
3 $4.35 5.584 Launch
h200-1.16.128.240
32,768.0
1 $4.74 33.918 Launch
teslaa100-2.24.256.320.nvlink
32,768.0
tensor
2 $4.94 41.212 Launch
h200-2.24.256.240.nvlink
32,768.0
tensor
2 $9.41 99.658 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.