Products

Cloud servers

Cloud servers with per-second billing. Isolated resources will give maximum performance for your project.

GPU servers

Cloud servers with modern RTX and Tesla graphics accelerators for games, rendering, streaming, working with 3D graphics, and artificial intelligence.

H200

H100 NVL

H100

RTX 5090

RTX 4090

RTX 3090

RTX 3080

A100

RTX A5000

A10

RTX 2080 Ti

A2

Tesla T4

Tesla V100

All GPU servers

CPU servers

The cloud servers with high-performance Intel Xeon Gold 2nd, 3rd and 5th generation CPU are available for 100% of the processor time.
SSD servers NVMe servers
All CPU servers

Immers Foundation Models

Automated catalog of verified open-source models with ready-made configurations for quick deployment. Run neural network models without paying per token.

Select a model

Dedicated servers

Rent a physically dedicated server for a long term with a monthly payment. Configure it using modern components: Intel Xeon Gold 2nd, 3rd and 5th generation processors, up to 10 of the latest RTX and Tesla video accelerators, and up to 8192 GB of RAM per server, SSD and NVMe disks for data centers.

Select a dedicated server

Marketplace

Use popular and modern applications as effective tools for organizing your project. Save time with pre-configured images that already have all the necessary components installed.

Forget about manually downloading and installing the software — just deploy a virtual server with a ready-made image.
Neural networks 3D CUDA Docker / NGC For games Windows images Linux images
All pre-installed images
Features
Prices
FAQ
Contact
Login

Immers Foundation Models

Our catalog features the most popular open-source AI models from developers worldwide, including large language models (LLMs), multimodal, and diffusion models. Try any model in one place — we’ve made it easy for you.
To explore and test a model, you can query it through our public endpoint. For production use, fine-tuning, or custom weights, we recommend renting a virtual or a dedicated GPU server.

NVIDIA-Nemotron-3-Super-120B-A12B

The NVIDIA Nemotron 3 Super 120B (12B active) is a hybrid model based on a sparse Latent Mixture-of-Experts (MoE) and Mamba-2 architecture, optimized for building complex agentic systems and handling contexts of up to 1 million tokens. Thanks to its innovative architecture, which activates only 12 billion parameters per token, and its Multi-Token Prediction (MTP) mechanism, the model delivers high inference efficiency, combining response quality with performance and computational savings when processing long sequences.

reasoning

10.03.2026

LTX-2.3

This is an updated version of the LTX-2 model, developed by Lightricks for synchronized video and audio generation within a single model. It is based on the DiT architecture and integrates key components of modern video generation systems. The model delivers improved audio and visual quality, as well as increased text prompt accuracy.

05.03.2026

gemma-4-E2B-it

The most compact model in the gemma-4 lineup, with an effective size of 2.3B parameters, full support for text, images, and audio. An ideal solution for agentic workflows on local and edge devices.

reasoning

multimodal

02.03.2026

gemma-4-E4B-it

A model with the innovative Per‑Layer Embeddings technique that, with an effective size of just 4.5B parameters, performs better than models two to three times larger. At the same time, the model retains reasoning capabilities and supports full multimodality (text, images, audio) — an ideal choice for complex tasks on local devices.

reasoning

multimodal

02.03.2026

FireRed-Image-Edit-1.1

An upgraded open-source general-purpose image editing foundation model, built upon the capabilities outlined in the FireRed-Image-Edit-1.0 Technical Report. This version significantly enhances identity consistency, multi-image conditioning, and domain-specialized editing performance, aligning closer to real-world creative production needs.

02.03.2026

Qwen3.5-0.8B

An ultra-compact multimodal model with 0.8 billion parameters, featuring a hybrid architecture of Gated DeltaNet and Gated Attention. It boasts a record-breaking context length of 262,144 tokens for its size, supports 201 languages, and offers two operational modes—standard and reasoning (thinking)—making it an ideal solution for prototyping, research, and fine-tuning for specific tasks.

reasoning

multimodal

28.02.2026

Qwen3.5-2B

A miniature 2B parameter model designed for prototyping, research tasks, and experiments. Despite its minimal size (2 billion parameters), it retains the key features of the series — the thinking mode, multimodality, a 262K token context, and a hybrid architecture, making it an excellent sandbox for studying the behavior of modern LLMs..

reasoning

multimodal

28.02.2026

Qwen3.5-4B

Qwen3.5-4B — небольшая модель с 4 миллиардами параметров, оптимизированная для развёртывания на edge-устройствах и мобильных платформах. Гибридная архитектура включает 32 слоя их которых 8 слоев с полным вниманием обеспечивает эффективную обработку последовательностей с минимальными вычислительными затратами. Несмотря на компактный размер, модель сохраняет все технические инновации серии Qwen3.5 в том числе нативную мультимодальность и контекстное окно 262K токенов позволяя обрабатывать длинные документы даже на устройствах с ограниченной памятью

На бенчмарках модель показывает результаты, превосходящие многие модели вдвое большего размера. В языковых тестах, таких как MMLU-Pro (79.1) и GPQA Diamond (76.2), она опережает Qwen3-Next-80B-A3B-Thinking в ряде сценариев. В агентных задачах TAU2-Bench (79.9) она демонстрирует результаты на уровне моделей в 20 раз больше, подтверждая свою эффективность в планировании и использовании инструментов. Мультимодальные способности также сильны: результат Mathvista(mini) (85.1) лишь немногим уступает модели 9B, а в CountBench (96.3) и MMBench (89.4) она входит в число лучших. Это делает ее идеальной для задач распознавания объектов, сцен и документов на устройствах с ограниченной памятью.

Уникальность модели — в переносе качеств «большого» ИИ на периферию. Это идеальное решение для мобильных приложений, дронов, роботов и умных камер, где требуется локальный и быстрый анализ визуальной и текстовой информации без интернета. Она выгодно отличается от других моделей своего класса редким сочетанием глубоких мультимодальных способностей и агентного «мышления» в таком компактном формате.

reasoning

multimodal

27.02.2026

Qwen3.5-9B

A compact model with 9 billion parameters, a 262K token context, and multimodal capabilities designed for efficiently solving a wide range of tasks under limited resources. It is perfectly suited for deployment on consumer hardware while being capable of delivering performance comparable to models 3–4 times its size.

reasoning

multimodal

27.02.2026

Qwen3.5-122B-A10B

A model with 122 billion parameters and a sparse MoE architecture that activates only 10B parameters per token, plus hybrid attention and native multimodality. It is ideal for tasks requiring reasoning, long-document analysis, and enterprise deployment with optimized resource requirements.

reasoning

multimodal

24.02.2026

Qwen3.5-27B

A dense model with 27 billion parameters and 64 layers of hybrid architecture, delivering memory efficiency, maximum predictability, and stable results in tasks requiring multimodal image analysis, programming, and logical reasoning.

reasoning

multimodal

24.02.2026

Qwen3.5-35B-A3B

A versatile model with 35 billion total parameters (activating 3B), it perfectly balances high performance with resource efficiency. It is ideally suited for production environments on accessible user hardware and excels at tasks requiring speed, multimodal support, reasoning, and long-context processing.

reasoning

multimodal

try online

24.02.2026

Helios-Base

A video generation model capable of creating video from text (T2V), images (I2V), and video (V2V), designed for real-time use and long-duration applications. It can generate video sequences lasting several minutes at a frame rate of 19.5 frames per second (FPS) using a single H100 GPU. The uniqueness of the model lies in its avoidance of traditional anti-drift methods (e.g., self-forcing, error-banks) and standard acceleration techniques (KV-cache, causal masking), all while maintaining video quality and synchronization.

23.02.2026

Qwen3.5-397B-A17B

A hybrid model from the Qwen team that combines advanced multimodal capabilities with exceptional efficiency thanks to the Gated DeltaNet and sparse Mixture-of-Experts (MoE) architecture. With a total of 397 billion parameters, the model activates only 17 billion per token, delivering high performance across a wide range of tasks—from complex mathematical reasoning to multimodal understanding and agent development.

reasoning

multimodal

16.02.2026

FireRed-Image-Edit-1.0

A model for image editing tasks that ensures high accuracy, quality, and consistency across various scenarios.

14.02.2026

MiniMax-M2.5

The flagship model of the series, achieving State-of-the-Art (SOTA) performance in coding, agentic tool use, and real-world practical "office" tasks. Thanks to massive-scale Reinforcement Learning (RL) and the innovative Forge framework, the M2.5 not only solves the most complex tasks but does so with exceptional accuracy and speed.

reasoning

12.02.2026

GLM-5

A foundational open-source model designed for solving complex tasks and long-running agent scenarios. With an MoE architecture of 754B parameters (40B active), sparse attention (DSA), innovative slime RL infrastructure, and a focus on practical utility, GLM-5 pushes AI interaction far beyond simple chat, transforming it into a full-fledged executive assistant.

reasoning

11.02.2026

Qwen3-Coder-Next

An efficient MoE model with 80B parameters (3B active), specifically designed for programming-oriented agents. The model features highly efficient inference, an extended context length (262K tokens), and best-in-class handling of various tool call formats, making it a highly suitable choice for deploying intelligent developer assistants.

coding

try online

30.01.2026

MOVA-720p

A foundation model designed for Image-to-Video-Audio (IT2VA) and Text-to-Video-Audio (T2VA) tasks, enabling simultaneous generation of high-fidelity video and synchronized audio. It addresses limitations of cascaded pipelines and proprietary systems by providing a fully open-source solution.

29.01.2026

MOVA-360p

29.01.2026