Gemma 3 12B is a well-balanced mid-sized multimodal language model developed by Google DeepMind, designed to tackle narrow, specialized professional tasks. With 12 billion parameters, the model combines high performance with computational efficiency and supports a wide range of capabilities—from text analysis to image processing. Gemma 3 12B converts visual data into tokens, enabling deep understanding of images. The "Pan&Scan" technology allows adaptive processing of images with any aspect ratio, preserving detail when scaling up to a resolution of 896×896.
Another key feature is the expanded context window of up to 128K tokens. This enables the model to process lengthy legal documents and scientific articles in a single request without losing context. Multilingual support covers more than 140 languages, including Russian, while the enhanced tokenizer from Gemini 2.0 ensures high-quality translation, text generation, and cross-lingual analysis. Additionally, developer-supported quantization makes it possible to run the model even on consumer-grade GPUs with minimal loss in quality.
As a result, Gemma 3 12B is a versatile tool for data analysis, document processing, and information extraction from visual sources—with the ability to run locally and scalable integration into modern AI infrastructures.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
131,072.0 |
1 | $0.53 | 1.446 | Launch | ||
131,072.0 tensor |
2 | $0.54 | 1.881 | Launch | ||
131,072.0 tensor |
2 | $0.57 | 1.881 | Launch | ||
131,072.0 tensor |
2 | $0.69 | 1.048 | Launch | ||
131,072.0 |
1 | $0.83 | 1.446 | Launch | ||
131,072.0 |
1 | $1.02 | 1.446 | Launch | ||
131,072.0 |
1 | $1.20 | 2.112 | Launch | ||
131,072.0 tensor |
2 | $1.23 | 3.212 | Launch | ||
131,072.0 pipeline |
3 | $1.43 | 1.483 | Launch | ||
131,072.0 |
1 | $1.59 | 2.112 | Launch | ||
131,072.0 tensor |
4 | $1.82 | 2.084 | Launch | ||
131,072.0 |
1 | $2.37 | 6.107 | Launch | ||
131,072.0 |
1 | $3.83 | 6.107 | Launch | ||
131,072.0 |
1 | $4.11 | 7.273 | Launch | ||
131,072.0 |
1 | $4.74 | 11.185 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
131,072.0 tensor |
2 | $0.54 | 1.168 | Launch | ||
131,072.0 tensor |
2 | $0.57 | 1.168 | Launch | ||
131,072.0 tensor |
2 | $0.93 | 2.499 | Launch | ||
131,072.0 pipeline |
3 | $0.95 | 1.020 | Launch | ||
131,072.0 tensor |
4 | $1.12 | 1.704 | Launch | ||
131,072.0 |
1 | $1.20 | 1.399 | Launch | ||
131,072.0 tensor |
2 | $1.23 | 2.499 | Launch | ||
131,072.0 tensor |
2 | $1.56 | 2.499 | Launch | ||
131,072.0 |
1 | $1.59 | 1.399 | Launch | ||
131,072.0 tensor |
4 | $1.82 | 1.371 | Launch | ||
131,072.0 tensor |
2 | $1.92 | 2.499 | Launch | ||
131,072.0 |
1 | $2.37 | 5.394 | Launch | ||
131,072.0 |
1 | $3.83 | 5.394 | Launch | ||
131,072.0 |
1 | $4.11 | 6.560 | Launch | ||
131,072.0 |
1 | $4.74 | 10.472 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
131,072.0 pipeline |
3 | $0.88 | 1.048 | Launch | ||
131,072.0 tensor |
2 | $0.93 | 1.279 | Launch | ||
131,072.0 tensor |
4 | $0.96 | 2.148 | Launch | ||
131,072.0 pipeline |
3 | $1.06 | 1.048 | Launch | ||
131,072.0 tensor |
2 | $1.23 | 1.279 | Launch | ||
131,072.0 tensor |
4 | $1.26 | 2.148 | Launch | ||
131,072.0 tensor |
2 | $1.56 | 1.279 | Launch | ||
131,072.0 tensor |
2 | $1.92 | 1.279 | Launch | ||
131,072.0 tensor |
2 | $2.22 | 2.611 | Launch | ||
131,072.0 |
1 | $2.37 | 4.174 | Launch | ||
131,072.0 tensor |
2 | $2.93 | 2.611 | Launch | ||
131,072.0 |
1 | $3.83 | 4.174 | Launch | ||
131,072.0 |
1 | $4.11 | 5.339 | Launch | ||
131,072.0 |
1 | $4.74 | 9.252 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.