Gemma 3 1B — a compact language model developed by Google DeepMind, achieving an impressive balance between size and capabilities. With just 1 billion parameters and a context window of 32,000 tokens, the model is highly efficient and capable of running on devices with limited resources. Thanks to its architecture, Gemma 3 1B optimizes memory usage, making it an ideal choice for embedded systems and mobile applications.
It's important to note that the 1B version is a text-only model and does not support image processing, unlike the larger variants in the Gemma 3 series (4B, 12B, and 27B), which offer multimodal capabilities.
Additionally, the model has limited language support and is primarily optimized for tasks in English.
Gemma 3 1B is available with open weights, making it easy to fine-tune and adapt to specific use cases. It comes in multiple quantization levels, ranging from 32-bit down to 4-bit, providing added flexibility when deploying on different hardware platforms.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.33 | 29.792 | Launch | ||
32,768.0 |
1 | $0.38 | 17.828 | Launch | ||
32,768.0 |
1 | $0.38 | 52.190 | 29.911 | Launch | |
32,768.0 |
1 | $0.53 | 70.130 | 53.540 | Launch | |
32,768.0 |
1 | $0.57 | 14.987 | Launch | ||
32,768.0 |
1 | $0.83 | 56.710 | Launch | ||
32,768.0 |
1 | $1.02 | 56.591 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 55.054 | Launch | ||
32,768.0 |
1 | $1.59 | 80.010 | Launch | ||
32,768.0 |
1 | $2.37 | 99.990 | 223.275 | Launch | |
32,768.0 |
1 | $3.83 | 103.880 | 223.066 | Launch | |
32,768.0 |
1 | $4.11 | 205.740 | 264.670 | Launch | |
32,768.0 tensor |
2 | $4.61 | 224.789 | Launch | ||
32,768.0 |
1 | $4.74 | 404.376 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 405.890 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.33 | 29.792 | Launch | ||
32,768.0 |
1 | $0.38 | 17.828 | Launch | ||
32,768.0 |
1 | $0.38 | 29.912 | Launch | ||
32,768.0 |
1 | $0.53 | 53.540 | Launch | ||
32,768.0 |
1 | $0.57 | 14.987 | Launch | ||
32,768.0 |
1 | $0.83 | 56.710 | Launch | ||
32,768.0 |
1 | $1.02 | 56.591 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 55.054 | Launch | ||
32,768.0 |
1 | $1.59 | 80.010 | Launch | ||
32,768.0 |
1 | $2.37 | 223.275 | Launch | ||
32,768.0 |
1 | $3.83 | 223.066 | Launch | ||
32,768.0 |
1 | $4.11 | 264.670 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 224.789 | Launch | ||
32,768.0 |
1 | $4.74 | 404.376 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 405.890 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.33 | 26.765 | Launch | ||
32,768.0 |
1 | $0.38 | 14.801 | Launch | ||
32,768.0 |
1 | $0.38 | 51.260 | 26.884 | Launch | |
32,768.0 |
1 | $0.53 | 62.650 | 50.513 | Launch | |
32,768.0 |
1 | $0.57 | 11.960 | Launch | ||
32,768.0 |
1 | $0.83 | 53.683 | Launch | ||
32,768.0 |
1 | $1.02 | 53.564 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 53.540 | Launch | ||
32,768.0 |
1 | $1.59 | 76.983 | Launch | ||
32,768.0 |
1 | $2.37 | 110.850 | 220.248 | Launch | |
32,768.0 |
1 | $3.83 | 116.270 | 220.039 | Launch | |
32,768.0 |
1 | $4.11 | 260.990 | 261.643 | Launch | |
32,768.0 tensor |
2 | $4.61 | 223.276 | Launch | ||
32,768.0 |
1 | $4.74 | 401.349 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 404.377 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.