Gemma‑4‑E2B‑it is the most compact and energy‑efficient model in the lineup, designed to operate under extremely tight resource constraints. Like the E4B version, it uses the Per‑Layer Embeddings (PLE) technique, which delivers high performance with minimal memory consumption. The model has a total of 5.1 billion parameters, but only the effective part — 2.3 billion — is active during inference. It is built on 35 layers, supports a context window of 128 thousand tokens, and uses hybrid attention with a sliding window of 512 tokens.
E2B is fully multimodal and can process not only text and images but also audio (equipped with an audio encoder of ~300M parameters). This feature set, combined with extremely low memory requirements, makes the model unique in its class. Developers emphasise that E2B is specifically designed for efficient local use on laptops and mobile devices. According to community estimates, the model can run on devices with less than 1.5 GB of RAM, including smartphones.
Despite its modest size, E2B delivers impressive results. Numerous independent community evaluations show that this model surpasses Gemma‑3 27B on some tasks, even though its effective size is 12 times smaller. Developers particularly recommend E2B for routine agentic workflows, optical character recognition (OCR) tasks, and scenarios where low latency and on‑device inference are critical. At the same time, the Apache 2.0 licence opens up broad opportunities for integrating the model into a wide variety of commercial applications.
For the developers’ usage recommendations for the model, please refer to this link - https://ai.google.dev/gemma/docs/core/model_card_4?hl=en
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
131,072.0 |
1 | $0.33 | 4.605 | Launch | ||
131,072.0 |
1 | $0.38 | 4.605 | Launch | ||
131,072.0 |
1 | $0.53 | 11.107 | Launch | ||
131,072.0 tensor |
2 | $0.69 | 7.224 | Launch | ||
131,072.0 |
1 | $0.83 | 11.107 | Launch | ||
131,072.0 tensor |
2 | $0.97 | 5.599 | Launch | ||
131,072.0 |
1 | $1.02 | 11.107 | Launch | ||
131,072.0 |
1 | $1.20 | 17.609 | Launch | ||
131,072.0 tensor |
2 | $1.23 | 28.355 | Launch | ||
131,072.0 |
1 | $1.59 | 17.609 | Launch | ||
131,072.0 |
1 | $2.37 | 56.619 | Launch | ||
131,072.0 |
1 | $3.83 | 56.619 | Launch | ||
131,072.0 |
1 | $4.11 | 67.997 | Launch | ||
131,072.0 |
1 | $4.74 | 106.195 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
131,072.0 |
1 | $0.33 | 3.044 | Launch | ||
131,072.0 |
1 | $0.38 | 3.044 | Launch | ||
131,072.0 |
1 | $0.53 | 9.546 | Launch | ||
131,072.0 tensor |
2 | $0.69 | 5.663 | Launch | ||
131,072.0 |
1 | $0.83 | 9.546 | Launch | ||
131,072.0 tensor |
2 | $0.97 | 4.037 | Launch | ||
131,072.0 |
1 | $1.02 | 9.546 | Launch | ||
131,072.0 |
1 | $1.20 | 16.047 | Launch | ||
131,072.0 tensor |
2 | $1.23 | 26.793 | Launch | ||
131,072.0 |
1 | $1.59 | 16.047 | Launch | ||
131,072.0 |
1 | $2.37 | 55.058 | Launch | ||
131,072.0 |
1 | $3.83 | 55.058 | Launch | ||
131,072.0 |
1 | $4.11 | 66.436 | Launch | ||
131,072.0 |
1 | $4.74 | 104.634 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
131,072.0 |
1 | $0.33 | 2.128 | Launch | ||
131,072.0 |
1 | $0.38 | 2.128 | Launch | ||
131,072.0 |
1 | $0.53 | 8.630 | Launch | ||
131,072.0 tensor |
2 | $0.69 | 4.747 | Launch | ||
131,072.0 |
1 | $0.83 | 8.630 | Launch | ||
131,072.0 tensor |
2 | $0.97 | 3.122 | Launch | ||
131,072.0 |
1 | $1.02 | 8.630 | Launch | ||
131,072.0 |
1 | $1.20 | 15.132 | Launch | ||
131,072.0 tensor |
2 | $1.23 | 25.878 | Launch | ||
131,072.0 |
1 | $1.59 | 15.132 | Launch | ||
131,072.0 |
1 | $2.37 | 54.142 | Launch | ||
131,072.0 |
1 | $3.83 | 54.142 | Launch | ||
131,072.0 |
1 | $4.11 | 65.521 | Launch | ||
131,072.0 |
1 | $4.74 | 103.718 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.