Models

  • Our catalog features the most popular open-source AI models from developers worldwide, including large language models (LLMs), multimodal, and diffusion models. Try any model in one place — we’ve made it easy for you.
  • To explore and test a model, you can query it through our public endpoint. For production use, fine-tuning, or custom weights, we recommend renting a virtual or a dedicated GPU server.

Lance

A research project developed by ByteDance. It is designed as a unified multimodal model intended for studying unified image and video understanding, generation, and editing within a relatively small model and limited compute budget. 

15.05.2026

DeepSeek-V4-Pro

A cutting-edge MoE model with 1.6 trillion total parameters (49 billion active), capable of ultra-efficient processing of up to 1 million tokens of context thanks to an innovative hybrid attention architecture – CSA+HCA. The model confidently leads in mathematics, programming, and agentic tasks, supports three configurable reasoning modes (“non-think”, “think high”, “think max”), and consumes nearly 10 times less KV-cache memory compared to previous DeepSeek flagships.

reasoning
coding
22.04.2026

DeepSeek-V4-Flash

An open MoE model from the DeepSeek V4 family, with 284 billion total parameters and 13 billion active per token, supporting a context of up to 1 million tokens. Thanks to its hybrid attention (CSA + HCA) it achieves extreme efficiency on ultra-long sequences. The model delivers results close to the Pro version in reasoning, programming, and agent tasks, while being far less demanding on infrastructure.

reasoning
coding
22.04.2026

Qwen3.6-27B

Qwen/Qwen3.6-27B is an open dense multimodal 27B-parameter model with a strong focus on agentic programming, large-repository work, and reasoning tasks. It supports text, images, and video, features a native context of 262K tokens, thinking/non-thinking modes, and outperforms not only Qwen3.5-27B but also the larger MoE model Qwen3.5-397B-A17B on a range of key benchmarks.

reasoning
multimodal
coding
21.04.2026

Qwen3.6-35B-A3B

Qwen/Qwen3.6-35B-A3B is an open multimodal Mixture-of-Experts model with 35B parameters, of which only about 3B are activated per token, reducing computational overhead. Its architecture, built on Gated DeltaNet and Gated Attention, delivers high efficiency and memory savings. The model handles text, images, and video, supports thinking and non-thinking modes, offers a 262K-token context window (expandable to 1M), and is especially well-suited for agentic programming, repository-level work, and visual-textual tasks.

reasoning
multimodal
coding
15.04.2026

Kimi-K2.6

Open, multimodal model from Moonshot AI built with an agent‑centric philosophy. It uses a Mixture‑of‑Experts architecture with 1 trillion total parameters (32 billion active per token), a 256K‑token context window, and native INT4 quantization. The model is optimized for long‑horizon software problem solving, autonomous operation, and “agent swarm” orchestration, confidently competing with the best closed models in these areas. K2.6 can carry out complex engineering tasks for hours, turn visual mock‑ups into production‑ready web applications, and decompose and coordinate up to 300 parallel sub‑agents within a single session — making it one of the finest open solutions for research tasks and an effective intelligent core for a wide range of high‑tech products.

reasoning
multimodal
coding
14.04.2026

ERNIE-Image-Turbo

An open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is based on the Diffusion Transformer (DiT) architecture and incorporates additional components to enhance text processing and the handling of structured tasks.

10.04.2026

MiniMax-M2.7

MiniMax-M2.7 is the first model to have participated in its own evolution: during the development process, it built its own skills and optimized its own training. The architecture, based on a 230B MoE (10B active parameters) with full attention, ensures consistently high quality in complex agentic and office tasks. On benchmarks, the model demonstrates results on par with the best closed-source solutions. It is ideally suited for developing autonomous agents, working with office documents, and comprehensive automation of complex professional tasks, acting as an "omniscient and empathetic AI colleague."

reasoning
coding
09.04.2026

ERNIE-Image

An open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is based on the Diffusion Transformer (DiT) architecture and incorporates additional components to enhance text processing and the handling of structured tasks.

07.04.2026

GLM-5.1

GLM‑5.1 is a flagship MoE model (744B total / 40B active parameters) featuring DSA sparse attention, built for sustained autonomous operation. At the time of its release, it holds the top position on SWE‑Bench Pro and CyberGym, outperforming all existing models (including closed-source ones), and consistently ranks among the leaders in other significant benchmarks. Crucially, it maintains the ability to make progress across hundreds of iterations and thousands of tool calls—where many models lose effectiveness and try to give a quick answer, GLM‑5.1 continues to search for the optimal solution.

reasoning
coding
03.04.2026

GigaChat3.1-702B-A36B

The flagship instruct model of the GigaChat family, built on a Mixture‑of‑Experts (MoE) architecture with 702 billion total and 36 billion active parameters. Combining Multi‑head Latent Attention (MLA), Multi‑Token Prediction (MTP) and native FP8 training delivers record‑breaking performance on long contexts while drastically reducing memory consumption. The model outperforms open‑source peers such as DeepSeek‑V3‑0324 and Qwen3‑235B‑A22B on a number of benchmarks and is released under the MIT license, making it suitable for commercial use.

21.03.2026

GigaChat3.1-10B-A1.8B

GigaChat 3.1 Lightning is a compact Mixture-of-Experts model with 1.8 billion active parameters out of 10 billion total, built on MLA attention and supporting MTP, which combined with native FP8 training delivers excellent speed and quality. The model holds leading positions in its class and is one of the best solutions for fast conversational AI assistants, as well as for running simple yet reliable agent systems with tool calling and other functionalities.

21.03.2026

gemma-4-26B-A4B-it

A highly efficient mixture‑of‑experts model that, activating only 3.8B parameters, delivers 97% of the quality of the flagship 31B model. The optimal choice for complex agentic and analytical tasks with moderate computational requirements.

reasoning
multimodal
coding
try online
11.03.2026

gemma-4-31B-it

The flagship dense model of the Gemma‑4 family, with 31B parameters it only slightly trails the largest proprietary and open‑source alternatives. Native multimodality, multilingual support, a 256K token context window, a hybrid sliding window attention mechanism to reduce memory requirements, and overall – an ideal choice for tasks demanding high‑quality reasoning and in‑depth analysis.

reasoning
multimodal
coding
11.03.2026

NVIDIA-Nemotron-3-Super-120B-A12B

The NVIDIA Nemotron 3 Super 120B (12B active) is a hybrid model based on a sparse Latent Mixture-of-Experts (MoE) and Mamba-2 architecture, optimized for building complex agentic systems and handling contexts of up to 1 million tokens. Thanks to its innovative architecture, which activates only 12 billion parameters per token, and its Multi-Token Prediction (MTP) mechanism, the model delivers high inference efficiency, combining response quality with performance and computational savings when processing long sequences.

reasoning
10.03.2026

LTX-2.3

This is an updated version of the LTX-2 model, developed by Lightricks for synchronized video and audio generation within a single model. It is based on the DiT architecture and integrates key components of modern video generation systems. The model delivers improved audio and visual quality, as well as increased text prompt accuracy.

05.03.2026

gemma-4-E2B-it

The most compact model in the gemma-4 lineup, with an effective size of 2.3B parameters, full support for text, images, and audio. An ideal solution for agentic workflows on local and edge devices.

reasoning
multimodal
02.03.2026

gemma-4-E4B-it

A model with the innovative Per‑Layer Embeddings technique that, with an effective size of just 4.5B parameters, performs better than models two to three times larger. At the same time, the model retains reasoning capabilities and supports full multimodality (text, images, audio) — an ideal choice for complex tasks on local devices.

reasoning
multimodal
02.03.2026

FireRed-Image-Edit-1.1

An upgraded open-source general-purpose image editing foundation model, built upon the capabilities outlined in the FireRed-Image-Edit-1.0 Technical Report. This version significantly enhances identity consistency, multi-image conditioning, and domain-specialized editing performance, aligning closer to real-world creative production needs.  

02.03.2026

Qwen3.5-0.8B

An ultra-compact multimodal model with 0.8 billion parameters, featuring a hybrid architecture of Gated DeltaNet and Gated Attention. It boasts a record-breaking context length of 262,144 tokens for its size, supports 201 languages, and offers two operational modes—standard and reasoning (thinking)—making it an ideal solution for prototyping, research, and fine-tuning for specific tasks.

reasoning
multimodal
28.02.2026