Models

  • Our catalog features the most popular open-source AI models from developers worldwide, including large language models (LLMs), multimodal, and diffusion models. Try any model in one place — we’ve made it easy for you.
  • To explore and test a model, you can query it through our public endpoint. For production use, fine-tuning, or custom weights, we recommend renting a virtual or a dedicated GPU server.

LTX-2.3

This is an updated version of the LTX-2 model, developed by Lightricks for synchronized video and audio generation within a single model. It is based on the DiT architecture and integrates key components of modern video generation systems. The model delivers improved audio and visual quality, as well as increased text prompt accuracy.

05.03.2026

Qwen3.5-397B-A17B

A hybrid model from the Qwen team that combines advanced multimodal capabilities with exceptional efficiency thanks to the Gated DeltaNet and sparse Mixture-of-Experts (MoE) architecture. With a total of 397 billion parameters, the model activates only 17 billion per token, delivering high performance across a wide range of tasks—from complex mathematical reasoning to multimodal understanding and agent development.

reasoning
multimodal
16.02.2026

FireRed-Image-Edit-1.0

A model for image editing tasks that ensures high accuracy, quality, and consistency across various scenarios.

14.02.2026

MiniMax-M2.5

The flagship model of the series, achieving State-of-the-Art (SOTA) performance in coding, agentic tool use, and real-world practical "office" tasks. Thanks to massive-scale Reinforcement Learning (RL) and the innovative Forge framework, the M2.5 not only solves the most complex tasks but does so with exceptional accuracy and speed.

reasoning
12.02.2026

GLM-5

A foundational open-source model designed for solving complex tasks and long-running agent scenarios. With an MoE architecture of 754B parameters (40B active), sparse attention (DSA), innovative slime RL infrastructure, and a focus on practical utility, GLM-5 pushes AI interaction far beyond simple chat, transforming it into a full-fledged executive assistant.

reasoning
11.02.2026

Qwen3-Coder-Next

An efficient MoE model with 80B parameters (3B active), specifically designed for programming-oriented agents. The model features highly efficient inference, an extended context length (262K tokens), and best-in-class handling of various tool call formats, making it a highly suitable choice for deploying intelligent developer assistants.

coding
try online
30.01.2026

MOVA-360p

A foundation model designed for Image-to-Video-Audio (IT2VA) and Text-to-Video-Audio (T2VA) tasks, enabling simultaneous generation of high-fidelity video and synchronized audio. It addresses limitations of cascaded pipelines and proprietary systems by providing a fully open-source solution.  

29.01.2026

MOVA-720p

A foundation model designed for Image-to-Video-Audio (IT2VA) and Text-to-Video-Audio (T2VA) tasks, enabling simultaneous generation of high-fidelity video and synchronized audio. It addresses limitations of cascaded pipelines and proprietary systems by providing a fully open-source solution.  

29.01.2026

HunyuanImage-3.0-Instruct

It is a native multimodal autoregressive model designed for image generation, supporting both text-to-image and image-to-image (TI2I) tasks. It features a unified architecture for multimodal understanding and generation, achieving performance comparable to leading closed-source models. The model includes two main variants: HunyuanImage-3.0 (text-to-image) and HunyuanImage-3.0-Instruct (enhanced with reasoning capabilities for intelligent prompt improvement and creative editing).

28.01.2026

DeepSeek-OCR-2

An innovative multimodal model for optical character recognition (OCR) that mimics human visual perception. Instead of standard line-by-line image scanning, its new DeepEncoder V2 uses a compact language model to dynamically reorder visual tokens, following the semantic logic of the document. This significantly improves the understanding of complex layouts, tables, and formulas while maintaining the high efficiency of the previous version.

multimodal
27.01.2026

lingbot-world-base-cam

This model is designed for image-to-video generation. It falls under the category of "World Model". The project is licensed under Apache-2.0, ensuring open access to the code and models.

26.01.2026

Z-Image

This is the base model of the ⚡️-Image family, designed for high-quality image generation, broad style coverage, and precise alignment with text prompts. It is intended for professional use, creative tasks, and research, in contrast to the accelerated version Z-Image-Turbo.

23.01.2026

GLM-4.7-Flash

A 30-billion parameter MoE model with only ~3.6B parameters activated per token, delivering record-breaking performance in its class with minimal resource requirements (~24 GB VRAM). The model leads in agent-based tasks and programming, supports a 200K context, and is optimized for easy local deployment.

reasoning
19.01.2026

FLUX.2-klein-4B

It is a 4 billion parameter rectified flow transformer model designed for fast image generation and editing. It unifies text-to-image generation and multi-reference image editing into a single compact architecture, enabling end-to-end inference in under a second. Optimized for real-time applications without compromising quality, it runs on consumer-grade GPUs such as NVIDIA RTX 3090/4070 with approximately 13GB VRAM.

14.01.2026

FLUX.2-klein-9B

It is a 9 billion parameter rectified flow transformer model designed for high-speed image generation and editing. It unifies text-to-image generation and multi-reference image editing into a single compact architecture, achieving state-of-the-art quality with end-to-end inference in under half a second. The model leverages an 8 billion parameter Qwen3 text embedder and is step-distilled to 4 inference steps, enabling real-time performance while matching or exceeding the quality of models five times its size.

14.01.2026

GLM-Image

It is a text-to-image and image-to-image generation model employing a hybrid architecture combining an autoregressive generator and a diffusion decoder. It excels in generating high-fidelity images with precise text rendering and semantic understanding, particularly in complex, information-dense scenarios.

08.01.2026

LTX-2

an audio-visual base model based on the DiT architecture, developed for synchronized generation of video and audio within a single model. It incorporates key components of modern video generation systems, including open weights and optimization for local use.

06.01.2026

Kimi-K2.5

An open-source model built on a Mixture-of-Experts architecture with 1 trillion parameters, of which 32 billion are activated per token. The developers have implemented a "visual agentic intelligence" paradigm within it—a combination of visual perception, reasoning, and autonomous agents. The model is multimodal, presented in native INT4 quantization, and includes a unique Agent Swarm mechanism that orchestrates and enables the parallel operation of up to 100 sub-agents. This improves quality and reduces the execution time for complex tasks by an average factor of 4.5.

reasoning
multimodal
01.01.2026

Qwen-Image-2512

It is the December 2025 update to Qwen-Image, a text-to-image foundational model. It is designed to generate high-quality images from textual prompts with enhanced capabilities in realism, detail rendering, and text integration.

30.12.2025

NextStep-1.1

The model for text-to-image generation represents an improved version of the previous NextStep-1 model. It was developed to enhance image quality and address visualization issues inherent in earlier versions.  

23.12.2025