Products

Cloud servers

Cloud servers with per-second billing. Isolated resources will give maximum performance for your project.

GPU servers

Cloud servers with modern RTX and Tesla graphics accelerators for games, rendering, streaming, working with 3D graphics, and artificial intelligence.

Tesla H200

Tesla H100

RTX 5090

RTX 4090

RTX 3090

RTX 3080

Tesla A100

RTX A5000

Tesla A10

RTX 2080 Ti

Tesla A2

Tesla T4

Tesla V100

All GPU servers

CPU servers

The cloud servers with high-performance Intel Xeon Gold 2nd and 3rd generation CPU are available for 100% of the processor time.
SSD servers NVMe servers
All CPU servers

Dedicated servers

Rent a physically dedicated server for a long term with a monthly payment. Configure it using modern components: Intel Xeon Gold 2nd and 3rd generation processors, up to 10 of the latest RTX and Tesla video accelerators, and up to 8192 GB of RAM per server, SSD and NVMe disks for data centers.

Select a dedicated server

Marketplace

Use popular and modern applications as effective tools for organizing your project. Save time with pre-configured images that already have all the necessary components installed.

Forget about manually downloading and installing the software — just deploy a virtual server with a ready-made image.
Neural networks 3D CUDA Docker / NGC For games Windows images Linux images
All pre-installed images
Features
Prices
FAQ
Contact
Login

Models

Our catalog features the most popular open-source AI models from developers worldwide, including large language models (LLMs), multimodal, and diffusion models. Try any model in one place — we’ve made it easy for you.
To explore and test a model, you can query it through our public endpoint. For production use, fine-tuning, or custom weights, we recommend renting a virtual or a dedicated GPU server.

ERNIE-4.5-VL-28B-A3B-Thinking

A compact multimodal model from Baidu, built on an innovative heterogeneous Mixture-of-Experts (MoE) architecture that separates parameters for textual and visual experts. During inference, only 3 billion parameters are activated out of a total model size of 28 billion parameters. The model is an upgraded version of the base ERNIE-4.5-VL-28B-A3B, specifically optimized for multimodal reasoning tasks through a "Thinking Mode." It supports images, videos, visual grounding, and tool invocation, with a native maximum context length of 131K tokens, and stands out for its moderate computational requirements.

reasoning

multimodal

07.11.2025

Qwen3-VL-2B-Instruct

The most compact model in the Qwen3-VL multimodal family. With 2 billion parameters and a dense architecture, it is optimized for fast conversational systems and deployment on edge devices. At the same time, the model retains and supports all the advanced capabilities of the series: high-quality comprehension of images, videos, and text, support for OCR in 32 languages, object positioning, timestamp localization, and a native context length of 256K tokens.

multimodal

22.10.2025

Qwen3-VL-2B-Thinking

With only 2 billion parameters, a 256K context window, and capability for edge inference, this is one of the smallest visual reasoning models specialized in multi-step reasoning for visual analysis of images and videos. This means it's almost literally capable of "thinking while looking at images." Unlike the Instruct version, this model generates detailed chains of thought before producing the final answer, which enhances accuracy but impacts processing speed.

reasoning

multimodal

22.10.2025

Qwen3-VL-32B-Instruct

A powerful multimodal model with 32 billion parameters and native support for a 256K context window, delivering state-of-the-art quality in multimodal understanding. The model outperforms the previous-generation 72B parameter version on most benchmarks, as well as similarly-sized solutions from other developers, such as GPT-5 and Claude 4.

multimodal

22.10.2025

Qwen3-VL-32B-Thinking

A reasoning version of the flagship 32-billion-parameter Danse model from the Qwen3-VL family, optimized for multi-step thinking and solving highly complex multimodal tasks that require deep analysis and logical inference based on visual information. It supports a native context of 256K (extendable to 1M) and achieves state-of-the-art among multimodal reasoning models of a similar size.

reasoning

multimodal

22.10.2025

Krea Realtime 14B

The Krea Realtime 14B model is a distilled version of the Wan 2.1 14B model (developed by Wan-AI) for text-to-video generation tasks. It was transformed into an autoregressive model using the Self-Forcing method, achieving an inference speed of 11 frames per second with 4 inference steps on a single NVIDIA B200 GPU.

20.10.2025

DeepSeek-OCR

An innovative VLM model for text recognition and document parsing, developed by DeepSeek.ai as part of research into the capabilities of information representation through the visual modality. The model offers a unique approach: instead of traditional text tokens, it uses visual tokens to encode information from documents, achieving text compression of 10-20 times, while reaching an OCR accuracy of 97%.

multimodal

20.10.2025

Qwen3-VL-4B-Thinking

A reasoning-optimized 4B version of the Qwen3-VL model series with a 256K context window (expandable to 1M). The response generation always employs reasoning chains, enabling it to tackle complex multimodal tasks, while incurring a throughput penalty. It demonstrates performance that is only slightly inferior to Qwen3-8B-VL, despite having significantly lower hardware requirements.

reasoning

multimodal

15.10.2025

Qwen3-VL-4B-Instruct

A compact 4-billion-parameter model that retains the full functionality of the Qwen3-VL series: fast response speed, high-quality multimodal understanding with spatial and temporal stamping. At the same time, it significantly reduces hardware requirements – when using half of the natively supported 256K context, the model runs stably on just a single 24GB GPU.

multimodal

15.10.2025

Qwen3-VL-8B-Instruct

A multimodal dense model with 8 billion parameters, optimized for dialogue and instruction following, capable of understanding images, videos, and text. It natively supports a context length of 256K tokens, features enhanced OCR for 32 languages, and possesses visual agent capabilities. The model demonstrates competitive performance against larger models on key benchmarks.

multimodal

15.10.2025

Qwen3-VL-8B-Thinking

A compact dense model with 8 billion parameters and enhanced step-by-step reasoning capabilities, specializing in complex multimodal tasks requiring in-depth analysis and superior visual content understanding. Natively supports a 256K token context. Outperforms renowned models such as Gemini 2.5 Flash Lite and GPT-5 nano high across nearly all key benchmarks.

reasoning

multimodal

15.10.2025

granite-4.0-h-small

The flagship Mixture-of-Experts (MoE) model from IBM's Granite-4.0 family, featuring 32 billion total parameters (with 9 billion active). It combines Mamba-2 and Transformer architectures to deliver performance on par with large-scale models while reducing memory requirements by 70% and doubling inference speed. It is optimized for enterprise-grade tasks such as RAG and agent workflows.

02.10.2025

granite-4.0-h-tiny

A compact, hybrid model with a Mamba2/Transformer architecture combined with a Mixture of Experts, where only 1 billion out of 7 billion parameters are activated. It is designed for fast task execution, including on edge devices and for local deployment. It requires only 8 GB of memory (in 8-bit format) and delivers high performance in function calling with minimal resource consumption.

02.10.2025

granite-4.0-h-micro

The most compact 3B model in the Granite-4.0 family, it combines a hybrid Mamba-2/Transformer architecture with traditional dense feedforward layers instead of a mixture of experts. Optimized for local devices (can even run on a Raspberry Pi), it simultaneously delivers high results in instruction understanding (84.32% on IFEval) and surpasses much larger models in RAG tasks.

02.10.2025

granite-4.0-micro

A conventional Transformer model with 3 billion parameters, designed as an alternative for platforms where support for the Mamba-2 hybrid architecture is not yet optimized. It provides full support for PEFT methods and optimization in llama.cpp, ensuring compatibility with existing infrastructure while maintaining the enhanced output quality characteristic of the Granite 4.0 generation of models.

02.10.2025

GLM-4.6

The updated version of the flagship Z.ai model features an extended context of 200K, enhanced reasoning capabilities, improved code generation, and tool support. It significantly outperforms GLM-4.5 and confidently competes with recognized leaders such as DeepSeek-V3.2-Exp and Claude Sonnet 4, while demonstrating substantial token usage efficiency. Ideal for agent systems, large-scale text analysis, and development automation.

reasoning

30.09.2025

DeepSeek-V3.2-Exp

DeepSeek-V3.2-Exp is an experimental model based on V3.1-Terminus, representing an intermediate step towards a next-generation architecture. The model incorporates DeepSeek Sparse Attention (DSA), a sparse attention mechanism that enhances the efficiency of training and inference in large-context scenarios. This model is a kind of snapshot of the current research being conducted by Deepseek-AI in its search for more efficient transformer architectures. According to test results, V3.2-Exp demonstrates performance comparable to the base version, with slight positive or negative dynamics on a number of benchmarks.

reasoning

29.09.2025

Qwen3-VL-30B-A3B-Instruct

A medium-sized multimodal model in the Qwen3-VL family with a 30B MoE architecture, a 256K native context, and advanced image, video, and OCR processing capabilities in 32 languages. Architectural innovations such as Interleaved-MRoPE, DeepStack and Text-Timestamp Alignment provide excellent quality solutions for multimodal tasks, surpassing a number of proprietary models on key benchmarks. The model is slightly inferior to the flagship of the Qwen3-VL-235B-A22B-Instruct series, but due to its size and configuration, it is significantly more economical in terms of resources on the inference.

multimodal

try online

26.09.2025

Qwen3-VL-30B-A3B-Thinking

A multimodal MoE model from the Qwen3-VL family of medium size, with 30 billion total parameters, 3 billion active parameters, and a context length of 256K tokens. It combines cutting-edge visual content processing capabilities with profound analytical skills. The model delivers highly accurate spatial understanding and temporal grounding, confidently pairing them with the efficiency and attention to detail characteristic of reasoning models.

reasoning

multimodal

26.09.2025

Qwen3-VL-235B-A22B-Thinking

A next-generation flagship multimodal reasoning LLM supporting 256K tokens (expandable up to 1M tokens). Thanks to unique architectural innovations such as Interleaved-MRoPE, DeepStack, and others, the model excels at OCR (supporting 32 languages, including Russian), video analytics, spatially aware image understanding, and is specially optimized for programming and implementing advanced agentic scenarios requiring sequential reasoning.

reasoning

multimodal

23.09.2025