Products

Cloud servers

Cloud platform with the latest GPUs, fast onboarding, per‑second billing, and immersion cooling. Isolated resources ensure maximum performance for your project..

GPU servers

Cloud servers with modern RTX and Tesla graphics accelerators for games, rendering, streaming, working with 3D graphics, and artificial intelligence.

H200

H100 NVL

H100

RTX 5090

RTX 4090

RTX 3090

RTX 3080

A100

RTX A5000

A10

A2

RTX 2080 Ti^EOL

Tesla T4^EOL

Tesla V100^EOL

All GPU servers

CPU servers

The cloud servers with high-performance Intel Xeon Gold 2nd, 3rd and 5th generation CPU are available for 100% of the processor time.
SSD serversдо 75К IOPS NVMe serversдо 360К IOPS
All CPU servers

Immers Foundation Models

The largest catalog of vetted open‑source models with automatic configuration selection and tuning for rapid deployment. Launch private endpoints with no token fees, or use public endpoints.

GLM-5.2 Kimi-K2.7-Code NVIDIA-Nemotron-3-Ultra-550B-A55B gemma-4-12B-it MiniMax-M3 DeepSeek-V4-Pro DeepSeek-V4-Flash Qwen3.6-27B Qwen3.6-35B-A3B Kimi-K2.6 GLM-5.1 gemma-4-26B-A4B-it gemma-4-31B-it NVIDIA-Nemotron-3-Super-120B-A12B Qwen3.5-122B-A10B Qwen3.5-397B-A17B gpt-oss-120b gpt-oss-20b

All modelsfrom catalogue

Dedicated servers

Rent a physically dedicated server for a long term with a monthly payment. Configure it using modern components: Intel Xeon Gold 2nd to 5th generation processors and up to 8192 GB of RAM per server, SSD and NVMe disks for data centers.

Select a dedicated serverдо 10 GPU и 2.5M IOPS

Marketplace

Use popular and modern applications as effective tools for organizing your project. Save time with pre-configured images that already have all the necessary components installed.

Forget about manually downloading and installing the software — just deploy a virtual server with a ready-made image.
Neural networks 3D CUDA Docker / NGC For games Windows images Linux images
All pre-installed images
Features
Prices
FAQ
Contact
Login

gemma-4-12B-it

reasoning

multimodal

Gemma 4 12B is a dense multimodal model with a Unified, encoder-free architecture. The model occupies a middle ground between the compact E4B for mobile devices and the more powerful 26B A4B MoE, filling a mid-range niche optimized for running on consumer laptops with 16 GB of video memory.

The key distinction of Gemma 4 12B from the rest of the family is its completely encoder-free architecture: instead of separate vision and audio encoders, the model uses linear projections to feed raw image patches and audio waveforms directly into a single decoder. This is the first mid-size model in the Gemma family with native audio input, making it a unique solution for local multimodal AI. All modalities flow through a single decoder-only transformer, which reduces latency and allows fine-tuning the entire model in a single pass — there is no need to align separate frozen encoders.

As with other models in the family, the decoder of Gemma 4 12B is built on a hybrid attention mechanism that alternates layers with local sliding-window attention (1024 tokens) and layers with full global attention. These layers use so-called heteromorphic heads — with varying sizes within a single model. The local layers provide speed and low memory usage, since each token only sees its neighbors within the window, while the global layers cover the entire context, ensuring deep understanding of long-range dependencies.

The model supports text, image, video, and audio processing. A built-in thinking mode allows the model to reason step-by-step before producing an answer, which is critical for complex tasks. The model also supports function calling for agentic scenarios, variable image resolution, and multilingualism (140+ languages during pre-training, 35+ languages out of the box). Multi-Token Prediction (MTP) is supported to accelerate inference, significantly reducing generation latency without quality loss. The vocabulary comprises 262K tokens, and the context window reaches 256K tokens.

On key benchmarks, Gemma 4 12B delivers results close to the substantially larger 26B A4B MoE. On AIME 2026 (advanced mathematical reasoning) the model scores 77.5%, nearly quadrupling the result of Gemma 3 27B (20.8%). On GPQA Diamond (PhD-level expert questions in physics, chemistry, and biology) the model reaches 78.8% — an outstanding result for a 12B model, surpassing many larger models. LiveCodeBench v6 (real-world code generation) — 72.0%, Codeforces ELO — 1659, confirming strong programming abilities. Multimodal tests: MMMU Pro (universal image understanding) — 69.1%, MATH-Vision (mathematics on images) — 79.7%, MMMLU (multilingual knowledge) — 83.4%. On the CoVoST benchmark (audio translation) the model achieves the best result among all Gemma models (38.5%).

The model’s use cases are defined by three key factors: compactness, multimodality with native audio, and agentic capabilities. Gemma 4 12B is ideally suited for local agentic systems — from autonomous coding assistants to multimodal AI assistants with voice input. The model is effective for speech recognition and translation, video fragment analysis, intelligent document processing, and for building embedded AI solutions on desktops. For more details on use cases, check out the developer guide: https://developers.googleblog.com/gemma-4-12b-the-developer-guide/

Announce Date: 03.06.2026
Parameters: 12B
Context: 263K
Layers: 48, using full attention: 8
Attention Type: Sliding Window Attention
Developer: Google DeepMind
Transformers Version: 5.10.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore gemma-4-12B-it capabilities. You can obtain an API access token on the token management page after registration and verification.

Model Name	Context	Type	GPU	Status	Link


        There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

maximize endpoint performance,
enable full context for long sequences,
ensure top-tier security for data processing in an isolated, dedicated environment,
use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting gemma-4-12B-it

Prices:

Name	GPU	Price, hour	Price, month	Max Concurrency
teslaa10-1.16.32.160 262,144.0	1	$0.53	$378.38	1.490	Launch
teslaa2-2.16.32.160 262,144.0 tensor	2	$0.57	$413.85	1.911	Launch
rtx3090-1.16.24.160 262,144.0	1	$0.83	$594.33	1.633	Launch
rtx4090-1.16.32.160 262,144.0	1	$1.02	$734.67	1.628	Launch
rtxa5000-2.16.64.160.nvlink 262,144.0 tensor	2	$1.23	$884.85	4.044	Launch
rtx3080-3.16.64.160 262,144.0 pipeline	3	$1.43	$1 026.72	1.376	Launch
rtx5090-1.16.64.160 262,144.0	1	$1.59	$1 142.79	2.685	Launch
rtx3080-4.16.64.160 262,144.0 tensor	4	$1.82	$1 310.46	2.190	Launch
teslaa100-1.16.64.160 262,144.0	1	$2.37	$1 707.06	9.154	Launch
h100-1.16.64.160 262,144.0	1	$3.83	$2 754.98	9.145	Launch
h100nvl-1.16.96.160 262,144.0	1	$4.11	$2 961.66	11.024	Launch
teslaa100-2.24.96.160.nvlink 262,144.0 tensor	2	$4.61	$3 319.56	19.373	Launch
h200-1.16.128.160 262,144.0	1	$4.74	$3 410.09	17.332	Launch
h200-2.24.256.160.nvlink 262,144.0 tensor	2	$9.40	$6 770.92	35.728	Launch

Prices:

Name	GPU	Price, hour	Price, month	Max Concurrency
teslaa2-2.16.32.160 262,144.0 tensor	2	$0.57	$413.85	1.194	Launch
teslaa10-2.16.64.160 262,144.0 tensor	2	$0.93	$672.04	3.328	Launch
rtxa5000-2.16.64.160.nvlink 262,144.0 tensor	2	$1.23	$884.85	3.328	Launch
rtx3090-2.16.64.160 262,144.0 tensor	2	$1.56	$1 126.67	3.614	Launch
rtx5090-1.16.64.160 262,144.0	1	$1.59	$1 142.79	1.969	Launch
rtx3080-4.16.64.160 262,144.0 tensor	4	$1.82	$1 310.46	1.473	Launch
rtx4090-2.16.64.160 262,144.0 tensor	2	$1.92	$1 384.62	3.604	Launch
teslaa100-1.16.64.160 262,144.0	1	$2.37	$1 707.06	8.438	Launch
h100-1.16.64.160 262,144.0	1	$3.83	$2 754.98	8.429	Launch
h100nvl-1.16.96.160 262,144.0	1	$4.11	$2 961.66	10.307	Launch
teslaa100-2.24.96.160.nvlink 262,144.0 tensor	2	$4.61	$3 319.56	18.657	Launch
h200-1.16.128.160 262,144.0	1	$4.74	$3 410.09	16.615	Launch
h200-2.24.256.160.nvlink 262,144.0 tensor	2	$9.40	$6 770.92	35.011	Launch

Prices:

Name	GPU	Price, hour	Price, month	Max Concurrency
teslaa10-2.16.64.160 262,144.0 tensor	2	$0.93	$672.04	1.838	Launch
teslaa2-3.32.128.160 262,144.0 pipeline	3	$1.06	$762.88	1.192	Launch
rtxa5000-2.16.64.160.nvlink 262,144.0 tensor	2	$1.23	$884.85	1.838	Launch
teslaa2-4.32.128.160 262,144.0 tensor	4	$1.26	$904.76	2.679	Launch
rtx3090-2.16.64.160 262,144.0 tensor	2	$1.56	$1 126.67	2.125	Launch
rtx4090-2.16.64.160 262,144.0 tensor	2	$1.92	$1 384.62	2.114	Launch
teslaa100-1.16.64.160 262,144.0	1	$2.37	$1 707.06	6.948	Launch
rtx5090-2.16.64.160 262,144.0 tensor	2	$2.93	$2 110.10	4.229	Launch
h100-1.16.64.160 262,144.0	1	$3.83	$2 754.98	6.939	Launch
h100nvl-1.16.96.160 262,144.0	1	$4.11	$2 961.66	8.817	Launch
teslaa100-2.24.96.160.nvlink 262,144.0 tensor	2	$4.61	$3 319.56	17.167	Launch
h200-1.16.128.160 262,144.0	1	$4.74	$3 410.09	15.126	Launch
h200-2.24.256.160.nvlink 262,144.0 tensor	2	$9.40	$6 770.92	33.522	Launch

Related models

gemma-4-31B-it

gemma-4-26B-A4B-it

gemma-4-E4B-it

gemma-4-E2B-it

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.

gemma-4-12B-it

Public endpoint

Private server

Recommended server configurations for hosting gemma-4-12B-it

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Subscribe to the availability notification

Related models

Need help?