Products

Cloud servers

Cloud platform with the latest GPUs, fast onboarding, per‑second billing, and immersion cooling. Isolated resources ensure maximum performance for your project..

GPU servers

Cloud servers with modern RTX and Tesla graphics accelerators for games, rendering, streaming, working with 3D graphics, and artificial intelligence.

H200

H100 NVL

H100

RTX 5090

RTX 4090

RTX 3090

RTX 3080

A100

RTX A5000

A10

A2

RTX 2080 Ti^EOL

Tesla T4^EOL

Tesla V100^EOL

All GPU servers

CPU servers

The cloud servers with high-performance Intel Xeon Gold 2nd, 3rd and 5th generation CPU are available for 100% of the processor time.
SSD serversдо 75К IOPS NVMe serversдо 360К IOPS
All CPU servers

Immers Foundation Models

The largest catalog of vetted open‑source models with automatic configuration selection and tuning for rapid deployment. Launch private endpoints with no token fees, or use public endpoints.

Kimi-K3 GLM-5.2 Kimi-K2.7-Code NVIDIA-Nemotron-3-Ultra-550B-A55B gemma-4-12B-it MiniMax-M3 PaddleOCR-VL-1.6 DeepSeek-V4-Pro DeepSeek-V4-Flash Qwen3.6-27B Qwen3.6-35B-A3B Kimi-K2.6 GLM-5.1 gemma-4-26B-A4B-it gemma-4-31B-it NVIDIA-Nemotron-3-Super-120B-A12B Qwen3.5-122B-A10B Qwen3.5-397B-A17B gpt-oss-120b gpt-oss-20b

All modelsfrom catalogue

Dedicated servers

Rent a physically dedicated server for a long term with a monthly payment. Configure it using modern components: Intel Xeon Gold 2nd to 5th generation processors and up to 8192 GB of RAM per server, SSD and NVMe disks for data centers.

Select a dedicated serverдо 10 GPU и 2.5M IOPS

Marketplace

Use popular and modern applications as effective tools for organizing your project. Save time with pre-configured images that already have all the necessary components installed.

Forget about manually downloading and installing the software — just deploy a virtual server with a ready-made image.
Neural networks 3D CUDA Docker / NGC For games Windows images Linux images
All pre-installed images
Features
Prices
FAQ
Contact
Login

Immers Foundation Models

Our catalog features the most popular open-source AI models from developers worldwide, including large language models (LLMs), multimodal, and diffusion models. Try any model in one place — we’ve made it easy for you.
To explore and test a model, you can query it through our public endpoint. For self-hosted production, fine-tuning or custom weights, we recommend renting a virtual or a dedicated GPU server.

Kimi-K3

The world's first open-weight 3T-class model (a 2.8 trillion-parameter MoE with 104 billion parameters activated per token), featuring native multimodality and a context window of up to 1 million tokens. Built on a hybrid Kimi Delta Attention (KDA) architecture with the Attention Residuals (AttnRes) mechanism and Stable LatentMoE technique, it delivers frontier-level reasoning, programming, and agentic capabilities, with fully open weights for research and deployment.

reasoning

multimodal

coding

27.06.2026

GLM-5.2

An open MoE model from Z.ai with 753B parameters, only 39 billion of which are activated per token, optimized for long‑horizon agentic tasks with a context length of one million tokens. Through the innovative IndexShare technique and strong reasoning and coding capabilities, it reaches the level of the best closed models, setting a new standard in the open‑source class.

reasoning

coding

try online

16.06.2026

Kimi-K2.7-Code

An open agentic model from Moonshot AI based on a MoE architecture (1T parameters, 32B activated) with MLA attention, native INT4 quantization, and multimodality (text, images, video). The model is optimized for long-horizon tasks, reduces thinking token consumption by 30% compared to K2.6, and competes with leading proprietary solutions.

reasoning

multimodal

coding

11.06.2026

NVIDIA-Nemotron-3-Ultra-550B-A55B

A flagship language model with 550 billion total parameters (55 billion active per inference pass), built on a hybrid LatentMoE architecture (Mamba-2 + Mixture of Experts + Attention), supporting up to 1 million tokens of context and a switchable reasoning mode. The model achieves record-breaking inference throughput — up to ~6× higher than comparable open LLMs — while matching the accuracy of the best global counterparts. This makes it the ideal choice for complex agentic tasks, long-context analysis, and high-load enterprise-grade scenarios.

reasoning

coding

03.06.2026

gemma-4-12B-it

A unique model in the Gemma 4 lineup with a Unified architecture and no encoders: visual and audio data are fed directly into a decoder-only transformer through linear projections (and a lightweight embedder for images), substantially eliminating encoding latency and enabling high-quality processing of all modalities. With only 12B parameters, the model is suitable for running on laptops with 16 GB of VRAM and delivers results on reasoning, coding, and multimodal understanding benchmarks comparable to a 26B MoE model.

reasoning

multimodal

03.06.2026

ideogram-4-nf4-diffusers

Ideogram 4 is Ideogram's first open weight text-to-image model. It is a state-of-the-art foundation model trained from scratch — not a fine-tune of any existing model.

03.06.2026

MiniMax-M3

A cutting-edge open multimodal model with 428 billion parameters (23B active) and an innovative MiniMax Sparse Attention mechanism that enables efficient processing of context up to 1 million tokens. The model is unique in combining native multimodality, outstanding programming skills, and agentic capabilities, allowing it to compete with leading closed-source solutions.

reasoning

multimodal

coding

02.06.2026

PaddleOCR-VL-1.6

A compact multimodal model (0.9B parameters) for intelligent document and image parsing, featuring highly accurate recognition of text, tables, formulas, charts, and stamps. The model ranks first on OmniDocBench v1.6 with a score of 96.33%, surpassing Gemini 3 Pro, GPT-5.2, Qwen3-VL-235B, and other solutions, while remaining lightweight enough for local deployment.

multimodal

27.05.2026

Lance

A research project developed by ByteDance. It is designed as a unified multimodal model intended for studying unified image and video understanding, generation, and editing within a relatively small model and limited compute budget.

15.05.2026

DeepSeek-V4-Pro

A cutting-edge MoE model with 1.6 trillion total parameters (49 billion active), capable of ultra-efficient processing of up to 1 million tokens of context thanks to an innovative hybrid attention architecture – CSA+HCA. The model confidently leads in mathematics, programming, and agentic tasks, supports three configurable reasoning modes (“non-think”, “think high”, “think max”), and consumes nearly 10 times less KV-cache memory compared to previous DeepSeek flagships.

reasoning

coding

22.04.2026

DeepSeek-V4-Flash

An open MoE model from the DeepSeek V4 family, with 284 billion total parameters and 13 billion active per token, supporting a context of up to 1 million tokens. Thanks to its hybrid attention (CSA + HCA) it achieves extreme efficiency on ultra-long sequences. The model delivers results close to the Pro version in reasoning, programming, and agent tasks, while being far less demanding on infrastructure.

reasoning

coding

22.04.2026

Qwen3.6-27B

Qwen/Qwen3.6-27B is an open dense multimodal 27B-parameter model with a strong focus on agentic programming, large-repository work, and reasoning tasks. It supports text, images, and video, features a native context of 262K tokens, thinking/non-thinking modes, and outperforms not only Qwen3.5-27B but also the larger MoE model Qwen3.5-397B-A17B on a range of key benchmarks.

reasoning

multimodal

coding

21.04.2026

Qwen3.6-35B-A3B

Qwen/Qwen3.6-35B-A3B is an open multimodal Mixture-of-Experts model with 35B parameters, of which only about 3B are activated per token, reducing computational overhead. Its architecture, built on Gated DeltaNet and Gated Attention, delivers high efficiency and memory savings. The model handles text, images, and video, supports thinking and non-thinking modes, offers a 262K-token context window (expandable to 1M), and is especially well-suited for agentic programming, repository-level work, and visual-textual tasks.

reasoning

multimodal

coding

15.04.2026

Kimi-K2.6

Open, multimodal model from Moonshot AI built with an agent‑centric philosophy. It uses a Mixture‑of‑Experts architecture with 1 trillion total parameters (32 billion active per token), a 256K‑token context window, and native INT4 quantization. The model is optimized for long‑horizon software problem solving, autonomous operation, and “agent swarm” orchestration, confidently competing with the best closed models in these areas. K2.6 can carry out complex engineering tasks for hours, turn visual mock‑ups into production‑ready web applications, and decompose and coordinate up to 300 parallel sub‑agents within a single session — making it one of the finest open solutions for research tasks and an effective intelligent core for a wide range of high‑tech products.

reasoning

multimodal

coding

14.04.2026

ERNIE-Image-Turbo

An open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is based on the Diffusion Transformer (DiT) architecture and incorporates additional components to enhance text processing and the handling of structured tasks.

10.04.2026

MiniMax-M2.7

MiniMax-M2.7 is the first model to have participated in its own evolution: during the development process, it built its own skills and optimized its own training. The architecture, based on a 230B MoE (10B active parameters) with full attention, ensures consistently high quality in complex agentic and office tasks. On benchmarks, the model demonstrates results on par with the best closed-source solutions. It is ideally suited for developing autonomous agents, working with office documents, and comprehensive automation of complex professional tasks, acting as an "omniscient and empathetic AI colleague."

reasoning

coding

09.04.2026

ERNIE-Image

07.04.2026

GLM-5.1

GLM‑5.1 is a flagship MoE model (744B total / 40B active parameters) featuring DSA sparse attention, built for sustained autonomous operation. At the time of its release, it holds the top position on SWE‑Bench Pro and CyberGym, outperforming all existing models (including closed-source ones), and consistently ranks among the leaders in other significant benchmarks. Crucially, it maintains the ability to make progress across hundreds of iterations and thousands of tool calls—where many models lose effectiveness and try to give a quick answer, GLM‑5.1 continues to search for the optimal solution.

reasoning

coding

03.04.2026

GigaChat3.1-702B-A36B

The flagship instruct model of the GigaChat family, built on a Mixture‑of‑Experts (MoE) architecture with 702 billion total and 36 billion active parameters. Combining Multi‑head Latent Attention (MLA), Multi‑Token Prediction (MTP) and native FP8 training delivers record‑breaking performance on long contexts while drastically reducing memory consumption. The model outperforms open‑source peers such as DeepSeek‑V3‑0324 and Qwen3‑235B‑A22B on a number of benchmarks and is released under the MIT license, making it suitable for commercial use.

21.03.2026

GigaChat3.1-10B-A1.8B

GigaChat 3.1 Lightning is a compact Mixture-of-Experts model with 1.8 billion active parameters out of 10 billion total, built on MLA attention and supporting MTP, which combined with native FP8 training delivers excellent speed and quality. The model holds leading positions in its class and is one of the best solutions for fast conversational AI assistants, as well as for running simple yet reliable agent systems with tool calling and other functionalities.

21.03.2026