Models

  • Our catalog features the most popular open-source AI models from developers worldwide, including large language models (LLMs), multimodal, and diffusion models. Try any model in one place — we’ve made it easy for you.
  • To explore and test a model, you can query it through our public endpoint. For production use, fine-tuning, or custom weights, we recommend renting a virtual or a dedicated GPU server.

Ministral-3-8B-Instruct-2512

Ministral-3-8B-Instruct is a balanced multimodal model with 8 billion parameters. It combines high performance with low system requirements, supports 256k context, reliably delivers agentic capabilities, and supports over 10 languages.

multimodal
31.10.2025

Ministral-3-14B-Instruct-2512

Ministral-3-14B-Instruct — the most powerful model in the Ministral 3 family with 14 billion parameters. Trained using Cascade Distillation, it offers multimodal and agentic capabilities, supports 256k context, and can run stably on hardware with 24 GB VRAM. Licensed under Apache 2.0.

multimodal
31.10.2025

Ministral-3-14B-Reasoning-2512

The largest reasoning model in the Ministral 3 family (13.5B language LLM + 0.4B vision encoder), delivering advanced reasoning and multimodal understanding capabilities. Demonstrates performance comparable to the larger 24B Mistral Small 3.2, with significantly lower resource requirements.

reasoning
multimodal
31.10.2025

Ministral-3-8B-Reasoning-2512

A balanced model in the Ministral 3 family (8.4B LLM + 0.4B vision encoder), optimized for efficient complex reasoning on edge devices. Provides an optimal performance-to-resource ratio, well-suited for local deployment.

reasoning
multimodal
31.10.2025

Ministral-3-3B-Reasoning-2512

Ministral-3-3B-Reasoning-2512 is the most compact reasoning, multimodal model in the Ministral-3 family, optimized for deployment on edge and embedded devices. It supports a 256k token context window and is released under the Apache 2.0 license.

reasoning
multimodal
31.10.2025

LongCat-Video

LongCat-Video is a 13.6B-parameter foundational video generation model developed to excel in Text-to-Video, Image-to-Video, and Video-Continuation tasks. It supports efficient and high-quality generation of long videos (minutes-long) without color drifting or quality degradation, marking an initial step toward world models.

24.10.2025

MiniMax-M2

A large language model that combines powerful reasoning capabilities with robust agent skills, designed to solve complex, multi-step tasks in real-world dynamic environments. Thanks to an innovative training approach utilizing high-quality, diverse data and "interleaved thinking," the M2 effectively combines high performance on academic benchmarks with exceptional robustness and adaptability when working with unfamiliar tools and scenarios

reasoning
22.10.2025

Qwen3-VL-2B-Thinking

With only 2 billion parameters, a 256K context window, and capability for edge inference, this is one of the smallest visual reasoning models specialized in multi-step reasoning for visual analysis of images and videos. This means it's almost literally capable of "thinking while looking at images." Unlike the Instruct version, this model generates detailed chains of thought before producing the final answer, which enhances accuracy but impacts processing speed.

reasoning
multimodal
22.10.2025

Qwen3-VL-2B-Instruct

The most compact model in the Qwen3-VL multimodal family. With 2 billion parameters and a dense architecture, it is optimized for fast conversational systems and deployment on edge devices. At the same time, the model retains and supports all the advanced capabilities of the series: high-quality comprehension of images, videos, and text, support for OCR in 32 languages, object positioning, timestamp localization, and a native context length of 256K tokens.

multimodal
22.10.2025

Qwen3-VL-32B-Instruct

A powerful multimodal model with 32 billion parameters and native support for a 256K context window, delivering state-of-the-art quality in multimodal understanding. The model outperforms the previous-generation 72B parameter version on most benchmarks, as well as similarly-sized solutions from other developers, such as GPT-5 and Claude 4.

multimodal
22.10.2025

Qwen3-VL-32B-Thinking

A reasoning version of the flagship 32-billion-parameter Danse model from the Qwen3-VL family, optimized for multi-step thinking and solving highly complex multimodal tasks that require deep analysis and logical inference based on visual information. It supports a native context of 256K (extendable to 1M) and achieves state-of-the-art among multimodal reasoning models of a similar size.

reasoning
multimodal
22.10.2025

avision

A Russian-language-adapted, multimodal model by Avito, based on Qwen2.5-VL-7B-Instruct with an optimized architecture. The model processes Russian-language queries twice as fast as the original and significantly outperforms it in generating ad descriptions, while retaining its general-purpose image-processing capabilities.

multimodal
21.10.2025

avibe

A Russian-language LLM developed by Avito, based on Qwen3-8B and featuring a unique hybrid tokenizer specifically adapted for Russian tokens. The model demonstrates outstanding performance on Russian-language benchmarks, particularly in mathematics and function calling, while its optimized architecture enables it to process queries 15–25% faster than the original version.

20.10.2025

Krea Realtime 14B

The Krea Realtime 14B model is a distilled version of the Wan 2.1 14B model (developed by Wan-AI) for text-to-video generation tasks. It was transformed into an autoregressive model using the Self-Forcing method, achieving an inference speed of 11 frames per second with 4 inference steps on a single NVIDIA B200 GPU.

20.10.2025

DeepSeek-OCR

An innovative VLM model for text recognition and document parsing, developed by DeepSeek.ai as part of research into the capabilities of information representation through the visual modality. The model offers a unique approach: instead of traditional text tokens, it uses visual tokens to encode information from documents, achieving text compression of 10-20 times, while reaching an OCR accuracy of 97%.

multimodal
20.10.2025

Qwen3-VL-4B-Thinking

A reasoning-optimized 4B version of the Qwen3-VL model series with a 256K context window (expandable to 1M). The response generation always employs reasoning chains, enabling it to tackle complex multimodal tasks, while incurring a throughput penalty. It demonstrates performance that is only slightly inferior to Qwen3-8B-VL, despite having significantly lower hardware requirements.

reasoning
multimodal
15.10.2025

Qwen3-VL-8B-Instruct

A multimodal dense model with 8 billion parameters, optimized for dialogue and instruction following, capable of understanding images, videos, and text. It natively supports a context length of 256K tokens, features enhanced OCR for 32 languages, and possesses visual agent capabilities. The model demonstrates competitive performance against larger models on key benchmarks.

multimodal
15.10.2025

Qwen3-VL-8B-Thinking

A compact dense model with 8 billion parameters and enhanced step-by-step reasoning capabilities, specializing in complex multimodal tasks requiring in-depth analysis and superior visual content understanding. Natively supports a 256K token context. Outperforms renowned models such as Gemini 2.5 Flash Lite and GPT-5 nano high across nearly all key benchmarks.

reasoning
multimodal
15.10.2025

Qwen3-VL-4B-Instruct

A compact 4-billion-parameter model that retains the full functionality of the Qwen3-VL series: fast response speed, high-quality multimodal understanding with spatial and temporal stamping. At the same time, it significantly reduces hardware requirements – when using half of the natively supported 256K context, the model runs stably on just a single 24GB GPU.

multimodal
15.10.2025

granite-4.0-h-small

The flagship Mixture-of-Experts (MoE) model from IBM's Granite-4.0 family, featuring 32 billion total parameters (with 9 billion active). It combines Mamba-2 and Transformer architectures to deliver performance on par with large-scale models while reducing memory requirements by 70% and doubling inference speed. It is optimized for enterprise-grade tasks such as RAG and agent workflows.

02.10.2025