GLM-4.6V continues to advance the multimodal direction of the GLM (General Language Model) architecture. The model has 106 billion parameters. Its text component (text encoder) consists of 46 layers, supports a context window of 128,000 tokens, and utilizes a 128-expert system, with 8 experts activated for processing each token.The key breakthrough of GLM-4.6V is the introduction of Native Multimodal Function Calling. Unlike traditional LLMs, which pass only textual descriptions of images to tools, GLM-4.6V can directly use images, screenshots, or PDF pages as input parameters for tools. This closes the "perception → understanding → action" loop: for example, the model can see a chart, automatically call a tool to analyze it, visually read the result, and integrate it into the final answer.
The model demonstrates excellent results on key multimodal benchmarks, holding leading positions among all open-source models of comparable scale.
The use cases for GLM-4.6V cover a broad spectrum of practical applications. The model is ideal for document workflow automation—it can analyze PDFs, tables, and scanned pages as unified visual objects, extracting structured information without preprocessing. This capability is particularly well-suited for business intelligence: GLM-4.6V can analyze reports, graphs, and charts, automatically generating textual interpretations and conclusions, significantly speeding up decision-making based on visual data.A standout application is front-end development and interface replication: from a screenshot of a layout, the model generates highly accurate HTML/CSS/JS code. A user can circle an area on the screenshot and give a textual command ("make this button blue"), and the model will automatically find and edit the corresponding piece of code.
| Model Name | Context | Type | GPU | TPS | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
131,072.0 tensor |
24 | 196608 | 160 | 6 | $3.50 | Launch | |
131,072.0 tensor |
32 | 98304 | 160 | 4 | $4.35 | Launch | |
131,072.0 tensor |
24 | 98304 | 160 | 2 | $4.61 | Launch | |
131,072.0 |
16 | 131072 | 160 | 1 | $4.74 | Launch | |
131,072.0 tensor |
16 | 131072 | 160 | 4 | $5.74 | Launch | |
131,072.0 tensor |
44 | 262144 | 160 | 6 | $5.83 | Launch | |
131,072.0 tensor |
24 | 262144 | 160 | 2 | $7.84 | Launch | |
131,072.0 tensor |
24 | 196608 | 240 | 2 | $8.17 | Launch | |
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
131,072.0 tensor |
24 | 262144 | 240 | 2 | $4.93 | Launch | |
131,072.0 tensor |
44 | 262144 | 240 | 8 | $7.52 | Launch | |
131,072.0 tensor |
24 | 262144 | 240 | 2 | $7.85 | Launch | |
131,072.0 tensor |
24 | 196608 | 240 | 2 | $8.17 | Launch | |
131,072.0 tensor |
44 | 262144 | 240 | 6 | $8.86 | Launch | |
131,072.0 tensor |
24 | 262144 | 240 | 2 | $9.41 | Launch | |
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
131,072.0 tensor |
16 | 262144 | 480 | 4 | $9.17 | Launch | |
131,072.0 tensor |
24 | 262144 | 320 | 2 | $9.42 | Launch | |
131,072.0 tensor |
24 | 393216 | 480 | 3 | $12.38 | Launch | |
131,072.0 tensor |
16 | 262144 | 480 | 4 | $14.99 | Launch | |
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.