Gemma 3 1B — a compact language model developed by Google DeepMind, achieving an impressive balance between size and capabilities. With just 1 billion parameters and a context window of 32,000 tokens, the model is highly efficient and capable of running on devices with limited resources. Thanks to its architecture, Gemma 3 1B optimizes memory usage, making it an ideal choice for embedded systems and mobile applications.
It's important to note that the 1B version is a text-only model and does not support image processing, unlike the larger variants in the Gemma 3 series (4B, 12B, and 27B), which offer multimodal capabilities.
Additionally, the model has limited language support and is primarily optimized for tasks in English.
Gemma 3 1B is available with open weights, making it easy to fine-tune and adapt to specific use cases. It comes in multiple quantization levels, ranging from 32-bit down to 4-bit, providing added flexibility when deploying on different hardware platforms.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.33 | 35.436 | Launch | ||
32,768.0 |
1 | $0.38 | 20.806 | Launch | ||
32,768.0 |
1 | $0.38 | 52.190 | 35.436 | Launch | |
32,768.0 |
1 | $0.53 | 70.130 | 58.843 | Launch | |
32,768.0 |
1 | $0.57 | 17.881 | Launch | ||
32,768.0 |
1 | $0.83 | 73.510 | 58.843 | Launch | |
32,768.0 |
1 | $1.02 | 95.840 | 58.843 | Launch | |
32,768.0 |
1 | $1.20 | 82.251 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 120.938 | Launch | ||
32,768.0 |
1 | $1.59 | 94.310 | 82.251 | Launch | |
32,768.0 |
1 | $2.37 | 99.990 | 222.695 | Launch | |
32,768.0 |
1 | $3.83 | 103.880 | 222.695 | Launch | |
32,768.0 |
1 | $4.11 | 205.740 | 263.657 | Launch | |
32,768.0 |
1 | $4.74 | 401.175 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.33 | 35.659 | Launch | ||
32,768.0 |
1 | $0.38 | 21.030 | Launch | ||
32,768.0 |
1 | $0.38 | 35.659 | Launch | ||
32,768.0 |
1 | $0.53 | 59.067 | Launch | ||
32,768.0 |
1 | $0.57 | 18.104 | Launch | ||
32,768.0 |
1 | $0.83 | 59.067 | Launch | ||
32,768.0 |
1 | $1.02 | 59.067 | Launch | ||
32,768.0 |
1 | $1.20 | 82.474 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 121.161 | Launch | ||
32,768.0 |
1 | $1.59 | 82.474 | Launch | ||
32,768.0 |
1 | $2.37 | 222.918 | Launch | ||
32,768.0 |
1 | $3.83 | 222.918 | Launch | ||
32,768.0 |
1 | $4.11 | 263.881 | Launch | ||
32,768.0 |
1 | $4.74 | 401.399 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.33 | 35.436 | Launch | ||
32,768.0 |
1 | $0.38 | 20.806 | Launch | ||
32,768.0 |
1 | $0.38 | 51.260 | 35.436 | Launch | |
32,768.0 |
1 | $0.53 | 62.650 | 58.843 | Launch | |
32,768.0 |
1 | $0.57 | 17.881 | Launch | ||
32,768.0 |
1 | $0.83 | 78.350 | 58.843 | Launch | |
32,768.0 |
1 | $1.02 | 87.180 | 58.843 | Launch | |
32,768.0 |
1 | $1.20 | 82.251 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 120.938 | Launch | ||
32,768.0 |
1 | $1.59 | 83.910 | 82.251 | Launch | |
32,768.0 |
1 | $2.37 | 110.850 | 222.695 | Launch | |
32,768.0 |
1 | $3.83 | 116.270 | 222.695 | Launch | |
32,768.0 |
1 | $4.11 | 260.990 | 263.657 | Launch | |
32,768.0 |
1 | $4.74 | 401.175 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.