Models

  • Our catalog features the most popular open-source AI models from developers worldwide, including large language models (LLMs), multimodal, and diffusion models. Try any model in one place — we’ve made it easy for you.
  • To explore and test a model, you can query it through our public endpoint. For production use, fine-tuning, or custom weights, we recommend renting a virtual or a dedicated GPU server.

DeepSeek-V3.2-Exp

DeepSeek-V3.2-Exp is an experimental model based on V3.1-Terminus, representing an intermediate step towards a next-generation architecture. The model incorporates DeepSeek Sparse Attention (DSA), a sparse attention mechanism that enhances the efficiency of training and inference in large-context scenarios. This model is a kind of snapshot of the current research being conducted by Deepseek-AI in its search for more efficient transformer architectures. According to test results, V3.2-Exp demonstrates performance comparable to the base version, with slight positive or negative dynamics on a number of benchmarks.

reasoning
29.09.2025

Qwen3-VL-235B-A22B-Instruct

The flagship multimodal model of the Qwen3-VL series. It combines high-quality text processing, excellent image understanding with spatial object positioning, video analysis capabilities with precise timing, and a long context window (natively 256K, expandable up to 1M tokens). The model is designed for applications requiring fast and accurate image and video processing, with additional bonuses including OCR support for 32 languages, the ability to perform agent-like actions within user interfaces, and code generation from multimodal inputs (e.g., generating frontend code for a website from a hand-drawn sketch).

multimodal
23.09.2025

Qwen3-VL-235B-A22B-Thinking

A next-generation flagship multimodal reasoning LLM supporting 256K tokens (expandable up to 1M tokens). Thanks to unique architectural innovations such as Interleaved-MRoPE, DeepStack, and others, the model excels at OCR (supporting 32 languages, including Russian), video analytics, spatially aware image understanding, and is specially optimized for programming and implementing advanced agentic scenarios requiring sequential reasoning.

reasoning
multimodal
23.09.2025

DeepSeek-V3.1-Terminus

The updated version of the flagship model, DeepSeek-V3.1, demonstrates significant improvements: developers have achieved greater language consistency — the model now less frequently mixes Chinese and English and completely avoids generating random characters. In addition, the agents have been substantially enhanced — both Code Agent and Search Agent now deliver higher performance. To top it off, the model has shown noticeable gains across a range of key benchmarks.

reasoning
22.09.2025

Qwen3-Next-80B-A3B-Thinking

An 80B-parameter MoE model, activating only 3B parameters per token, featuring hybrid attention (Gated DeltaNet + Gated Attention) and a native context window of 262K (expandable to ~1M), specifically optimized for complex step-by-step reasoning in "thinking" mode. Thanks to its ultra-sparse MoE architecture (512 experts, 10 active + 1 shared), Multi-Token Prediction (MTP), and other enhancements, the model delivers high efficiency on long contexts and achieves strong performance in mathematics, programming, and agent-based tasks.

reasoning
11.09.2025

Qwen3-Next-80B-A3B-Instruct

A next-generation 80-billion-parameter MoE model with 512 experts. Trained on approximately 15 trillion tokens, the model features a hybrid attention architecture (Gated DeltaNet + Gated Attention), supports a native context length of 256K tokens, and can scale up to ~1M tokens. Despite activating only 3 billion parameters and 10 experts per token during inference, it reaches the performance level of 200B+ class models across several tasks, while delivering excellent inference speed—particularly when processing long prompts. The model operates exclusively in instruct-mode (without "thinking") and leverages Multi-Token Prediction technology, which enhances generation speed, improves text coherence, and enables higher-quality generalization.

11.09.2025

Kimi-K2-0905

An update to one of the largest MoE-LLMs with 1T parameters. The developers have extended the context length to 256K, focusing on frontend programming tasks, agent capabilities, and improved tool-calling functionality. As a result, the model shows significant gains in accuracy across several public benchmarks and competes strongly with the best proprietary solutions.

05.09.2025

DeepSeek-V3.1

A major update in DeepSeek-AI's LLM series, marking a significant step toward AI agent-oriented solutions. DeepSeek v3.1 is now a hybrid model supporting two intelligent modes (thinking/non-thinking), leading its class in accuracy and application flexibility. Performance improvements are evident across all benchmarks, with developers placing particular emphasis on enhanced tool usage efficiency. As a result, the model is ideally suited for complex analytical and research tasks, as well as enterprise-level agent systems.

reasoning
21.08.2025

Seed-OSS-36B

An advanced open language model with 32 billion parameters, optimized for complex instruction following, dialogue, and agent-based scenarios, featuring uniquely flexible control over "thinking budget" and supporting a 512K context window. The model is ideally suited for customer consultation and support chatbots, processing long documents, legal files, scientific and technical reports, and, not least, for automating business processes—particularly through intelligent assistants.

reasoning
20.08.2025

Qwen-Image-Edit

Qwen-Image-Edit is an image editing model developed by Qwen, based on the 20B-parameter Qwen-Image architecture (Qwen2.5-VL + VAE Encoder).

17.08.2025

gemma-3-270m

A new, ultra-compact (270 million parameters) and high-performance model from the Gemma 3 family by Google DeepMind. This solution is designed for rapid local deployment and can run efficiently on-device, including embedded systems and web browsers. It was specifically created for use after fine-tuning for specific tasks, yet it can follow instructions and structure text effectively right out of the box. Ideal for fast text classification, data extraction, and other tasks where speed, accuracy, energy efficiency, and privacy are paramount.

14.08.2025

GLM-4.5V

A next-generation multimodal model that processes images, video, text, and graphical user interfaces. Its architecture is built upon the flagship MoE-based GLM-4.5 Air and supports Thinking Mode for deep reasoning and No-Thinking Mode for rapid responses. At launch, the model achieves leading performance on 41 out of 42 key benchmarks used to evaluate LLMs capable of processing visual and textual information.

reasoning
multimodal
11.08.2025

Qwen3-4B-Thinking-2507

An upgraded hybrid Qwen3-4B model specialized in complex reasoning, featuring an extended context length of 262K tokens and operating exclusively in reasoning mode. Despite its 4 billion parameters, the model achieves an impressive score of 81.3 on the AIME25 math olympiad benchmark. It is ideal for local deployment, code debugging, analytical tasks, and any scenarios requiring step-by-step, thoughtful problem solving.

reasoning
07.08.2025

Qwen3-4B-Instruct-2507

A compact yet high-performance language model with 4B parameters, specialized in rapidly executing instructions without internal reasoning. The model outperforms GPT-4.1-nano across all key metrics and supports a context length of up to 262K tokens. Ideal for classification tasks, knowledge-base-powered response generation, and conversational assistants—essentially any scenario requiring high-speed query processing and precise adherence to instructions.

07.08.2025

gpt-oss-20b

A compact yet powerful reasoning MoE model from OpenAI, featuring 20.9 billion total parameters (with 3.61 billion activated per token), capable of running on just 16GB of memory—making it ideal for local deployment using widely accessible consumer hardware. Despite its efficiency, it retains all advanced reasoning and tool-use capabilities, outperforming not only existing open-source models but also OpenAI's popular o3-mini on a range of key benchmarks. This makes gpt-oss-20b a strong choice for diverse research and product applications.

reasoning
try online
05.08.2025

gpt-oss-120b

The flagship open reasoning model from OpenAI, carrying forward the company's best scientific advancements and achievements used in the renowned ChatGPT. This model features a unique MoE (Mixture of Experts) architecture with 116.8 billion parameters, yet activates only 5.1 billion parameters per token, and is equipped with numerous innovations that efficiently balance performance and resource consumption—enabling the model to run on a single 80GB GPU. GPT-OSS-120B supports a three-level reasoning system and, for the first time in open models, introduces an extended role hierarchy and output channels aligned with specific roles, collectively allowing users to precisely customize and control the model's behavior.

reasoning
05.08.2025

Qwen-Image

A multimodal model for generating and editing images based on text prompts, part of the Qwen series of models. It demonstrates significant improvements in accurately rendering complex text (including Chinese) and performing advanced image-editing operations. The model possesses generalized capabilities in both image creation and editing, with an emphasis on preserving font details, composition, and contextual harmony of the text.

04.08.2025

Qwen3-30B-A3B-Thinking-2507

Qwen3-30B-A3B-Thinking-2507 — an upgraded and specialized version of Qwen3-30B-A3B fine-tuned exclusively for reasoning tasks. With 30.5B total parameters (3.3B active), 128 experts (8 activated per token), and an extended context length of 262,144, this model stands as the ideal open-source solution among mid-sized models for applications requiring high-quality reasoning—whether for tool usage, agent-like capabilities, or generating well-structured, accurate responses to highly complex user queries.

reasoning
29.07.2025

Qwen3-30B-A3B-Instruct-2507

An updated version of Qwen3-30B-A3B with 30.5 billion total parameters (3.3B active) and an extended context length of 262,144, designed for generating instant and accurate responses without intermediate reasoning steps. An exceptionally efficient dialogue model capable of solving both technical and creative tasks—ideal for use in chatbots.

29.07.2025

GLM-4.5

A hybrid model with 355B parameters, combining advanced reasoning, programming with artifact generation, and agent capabilities within a unified MoE architecture featuring an increased number of hidden layers. At launch, the model ranks 3rd globally in average score across 12 key benchmarks. Particularly impressive are its abilities in generating complete web applications, interactive presentations, and complex code. Users need only describe to the model how the program should function and what outcome they expect.

reasoning
28.07.2025