Models

  • Our catalog features the most popular open-source AI models from developers worldwide, including large language models (LLMs), multimodal, and diffusion models. Try any model in one place — we’ve made it easy for you.
  • To explore and test a model, you can query it through our public endpoint. For production use, fine-tuning, or custom weights, we recommend renting a virtual or a dedicated GPU server.

DeepSeek-OCR

An innovative VLM model for text recognition and document parsing, developed by DeepSeek.ai as part of research into the capabilities of information representation through the visual modality. The model offers a unique approach: instead of traditional text tokens, it uses visual tokens to encode information from documents, achieving text compression of 10-20 times, while reaching an OCR accuracy of 97%.

multimodal
20.10.2025

Krea Realtime 14B

The Krea Realtime 14B model is a distilled version of the Wan 2.1 14B model (developed by Wan-AI) for text-to-video generation tasks. It was transformed into an autoregressive model using the Self-Forcing method, achieving an inference speed of 11 frames per second with 4 inference steps on a single NVIDIA B200 GPU.

20.10.2025

Qwen3-VL-8B-Thinking

A compact dense model with 8 billion parameters and enhanced step-by-step reasoning capabilities, specializing in complex multimodal tasks requiring in-depth analysis and superior visual content understanding. Natively supports a 256K token context. Outperforms renowned models such as Gemini 2.5 Flash Lite and GPT-5 nano high across nearly all key benchmarks.

reasoning
multimodal
15.10.2025

Qwen3-VL-8B-Instruct

A multimodal dense model with 8 billion parameters, optimized for dialogue and instruction following, capable of understanding images, videos, and text. It natively supports a context length of 256K tokens, features enhanced OCR for 32 languages, and possesses visual agent capabilities. The model demonstrates competitive performance against larger models on key benchmarks.

multimodal
15.10.2025

granite-4.0-h-micro

The most compact 3B model in the Granite-4.0 family, it combines a hybrid Mamba-2/Transformer architecture with traditional dense feedforward layers instead of a mixture of experts. Optimized for local devices (can even run on a Raspberry Pi), it simultaneously delivers high results in instruction understanding (84.32% on IFEval) and surpasses much larger models in RAG tasks.

02.10.2025

granite-4.0-micro

A conventional Transformer model with 3 billion parameters, designed as an alternative for platforms where support for the Mamba-2 hybrid architecture is not yet optimized. It provides full support for PEFT methods and optimization in llama.cpp, ensuring compatibility with existing infrastructure while maintaining the enhanced output quality characteristic of the Granite 4.0 generation of models.

02.10.2025

granite-4.0-h-tiny

A compact, hybrid model with a Mamba2/Transformer architecture combined with a Mixture of Experts, where only 1 billion out of 7 billion parameters are activated. It is designed for fast task execution, including on edge devices and for local deployment. It requires only 8 GB of memory (in 8-bit format) and delivers high performance in function calling with minimal resource consumption.

02.10.2025

granite-4.0-h-small

The flagship Mixture-of-Experts (MoE) model from IBM's Granite-4.0 family, featuring 32 billion total parameters (with 9 billion active). It combines Mamba-2 and Transformer architectures to deliver performance on par with large-scale models while reducing memory requirements by 70% and doubling inference speed. It is optimized for enterprise-grade tasks such as RAG and agent workflows.

02.10.2025

GLM-4.6

The updated version of the flagship Z.ai model features an extended context of 200K, enhanced reasoning capabilities, improved code generation, and tool support. It significantly outperforms GLM-4.5 and confidently competes with recognized leaders such as DeepSeek-V3.2-Exp and Claude Sonnet 4, while demonstrating substantial token usage efficiency. Ideal for agent systems, large-scale text analysis, and development automation.

reasoning
30.09.2025

DeepSeek-V3.2-Exp

DeepSeek-V3.2-Exp is an experimental model based on V3.1-Terminus, representing an intermediate step towards a next-generation architecture. The model incorporates DeepSeek Sparse Attention (DSA), a sparse attention mechanism that enhances the efficiency of training and inference in large-context scenarios. This model is a kind of snapshot of the current research being conducted by Deepseek-AI in its search for more efficient transformer architectures. According to test results, V3.2-Exp demonstrates performance comparable to the base version, with slight positive or negative dynamics on a number of benchmarks.

reasoning
29.09.2025

Qwen3-VL-30B-A3B-Instruct

A medium-sized multimodal model in the Qwen3-VL family with a 30B MoE architecture, a 256K native context, and advanced image, video, and OCR processing capabilities in 32 languages. Architectural innovations such as Interleaved-MRoPE, DeepStack and Text-Timestamp Alignment provide excellent quality solutions for multimodal tasks, surpassing a number of proprietary models on key benchmarks. The model is slightly inferior to the flagship of the Qwen3-VL-235B-A22B-Instruct series, but due to its size and configuration, it is significantly more economical in terms of resources on the inference.

multimodal
26.09.2025

Qwen3-VL-30B-A3B-Thinking

A multimodal MoE model from the Qwen3-VL family of medium size, with 30 billion total parameters, 3 billion active parameters, and a context length of 256K tokens. It combines cutting-edge visual content processing capabilities with profound analytical skills. The model delivers highly accurate spatial understanding and temporal grounding, confidently pairing them with the efficiency and attention to detail characteristic of reasoning models.

reasoning
multimodal
26.09.2025

Qwen3-VL-235B-A22B-Thinking

A next-generation flagship multimodal reasoning LLM supporting 256K tokens (expandable up to 1M tokens). Thanks to unique architectural innovations such as Interleaved-MRoPE, DeepStack, and others, the model excels at OCR (supporting 32 languages, including Russian), video analytics, spatially aware image understanding, and is specially optimized for programming and implementing advanced agentic scenarios requiring sequential reasoning.

reasoning
multimodal
23.09.2025

Qwen3-VL-235B-A22B-Instruct

The flagship multimodal model of the Qwen3-VL series. It combines high-quality text processing, excellent image understanding with spatial object positioning, video analysis capabilities with precise timing, and a long context window (natively 256K, expandable up to 1M tokens). The model is designed for applications requiring fast and accurate image and video processing, with additional bonuses including OCR support for 32 languages, the ability to perform agent-like actions within user interfaces, and code generation from multimodal inputs (e.g., generating frontend code for a website from a hand-drawn sketch).

multimodal
23.09.2025

DeepSeek-V3.1-Terminus

The updated version of the flagship model, DeepSeek-V3.1, demonstrates significant improvements: developers have achieved greater language consistency — the model now less frequently mixes Chinese and English and completely avoids generating random characters. In addition, the agents have been substantially enhanced — both Code Agent and Search Agent now deliver higher performance. To top it off, the model has shown noticeable gains across a range of key benchmarks.

reasoning
22.09.2025

Qwen3-Next-80B-A3B-Instruct

A next-generation 80-billion-parameter MoE model with 512 experts. Trained on approximately 15 trillion tokens, the model features a hybrid attention architecture (Gated DeltaNet + Gated Attention), supports a native context length of 256K tokens, and can scale up to ~1M tokens. Despite activating only 3 billion parameters and 10 experts per token during inference, it reaches the performance level of 200B+ class models across several tasks, while delivering excellent inference speed—particularly when processing long prompts. The model operates exclusively in instruct-mode (without "thinking") and leverages Multi-Token Prediction technology, which enhances generation speed, improves text coherence, and enables higher-quality generalization.

11.09.2025

Qwen3-Next-80B-A3B-Thinking

An 80B-parameter MoE model, activating only 3B parameters per token, featuring hybrid attention (Gated DeltaNet + Gated Attention) and a native context window of 262K (expandable to ~1M), specifically optimized for complex step-by-step reasoning in "thinking" mode. Thanks to its ultra-sparse MoE architecture (512 experts, 10 active + 1 shared), Multi-Token Prediction (MTP), and other enhancements, the model delivers high efficiency on long contexts and achieves strong performance in mathematics, programming, and agent-based tasks.

reasoning
11.09.2025

Kimi-K2-0905

An update to one of the largest MoE-LLMs with 1T parameters. The developers have extended the context length to 256K, focusing on frontend programming tasks, agent capabilities, and improved tool-calling functionality. As a result, the model shows significant gains in accuracy across several public benchmarks and competes strongly with the best proprietary solutions.

05.09.2025

DeepSeek-V3.1

A major update in DeepSeek-AI's LLM series, marking a significant step toward AI agent-oriented solutions. DeepSeek v3.1 is now a hybrid model supporting two intelligent modes (thinking/non-thinking), leading its class in accuracy and application flexibility. Performance improvements are evident across all benchmarks, with developers placing particular emphasis on enhanced tool usage efficiency. As a result, the model is ideally suited for complex analytical and research tasks, as well as enterprise-level agent systems.

reasoning
21.08.2025

Seed-OSS-36B

An advanced open language model with 32 billion parameters, optimized for complex instruction following, dialogue, and agent-based scenarios, featuring uniquely flexible control over "thinking budget" and supporting a 512K context window. The model is ideally suited for customer consultation and support chatbots, processing long documents, legal files, scientific and technical reports, and, not least, for automating business processes—particularly through intelligent assistants.

reasoning
20.08.2025