A flagship language model with 550 billion total parameters (55 billion active per inference pass), built on a hybrid LatentMoE architecture (Mamba-2 + Mixture of Experts + Attention), supporting up to 1 million tokens of context and a switchable reasoning mode. The model achieves record-breaking inference throughput — up to ~6× higher than comparable open LLMs — while matching the accuracy of the best global counterparts. This makes it the ideal choice for complex agentic tasks, long-context analysis, and high-load enterprise-grade scenarios.
A unique model in the Gemma 4 lineup with a Unified architecture and no encoders: visual and audio data are fed directly into a decoder-only transformer through linear projections (and a lightweight embedder for images), substantially eliminating encoding latency and enabling high-quality processing of all modalities. With only 12B parameters, the model is suitable for running on laptops with 16 GB of VRAM and delivers results on reasoning, coding, and multimodal understanding benchmarks comparable to a 26B MoE model.
Ideogram 4 is Ideogram's first open weight text-to-image model. It is a state-of-the-art foundation model trained from scratch — not a fine-tune of any existing model.
A research project developed by ByteDance. It is designed as a unified multimodal model intended for studying unified image and video understanding, generation, and editing within a relatively small model and limited compute budget.
A cutting-edge MoE model with 1.6 trillion total parameters (49 billion active), capable of ultra-efficient processing of up to 1 million tokens of context thanks to an innovative hybrid attention architecture – CSA+HCA. The model confidently leads in mathematics, programming, and agentic tasks, supports three configurable reasoning modes (“non-think”, “think high”, “think max”), and consumes nearly 10 times less KV-cache memory compared to previous DeepSeek flagships.
An open MoE model from the DeepSeek V4 family, with 284 billion total parameters and 13 billion active per token, supporting a context of up to 1 million tokens. Thanks to its hybrid attention (CSA + HCA) it achieves extreme efficiency on ultra-long sequences. The model delivers results close to the Pro version in reasoning, programming, and agent tasks, while being far less demanding on infrastructure.
Qwen/Qwen3.6-27B is an open dense multimodal 27B-parameter model with a strong focus on agentic programming, large-repository work, and reasoning tasks. It supports text, images, and video, features a native context of 262K tokens, thinking/non-thinking modes, and outperforms not only Qwen3.5-27B but also the larger MoE model Qwen3.5-397B-A17B on a range of key benchmarks.
Qwen/Qwen3.6-35B-A3B is an open multimodal Mixture-of-Experts model with 35B parameters, of which only about 3B are activated per token, reducing computational overhead. Its architecture, built on Gated DeltaNet and Gated Attention, delivers high efficiency and memory savings. The model handles text, images, and video, supports thinking and non-thinking modes, offers a 262K-token context window (expandable to 1M), and is especially well-suited for agentic programming, repository-level work, and visual-textual tasks.
Open, multimodal model from Moonshot AI built with an agent‑centric philosophy. It uses a Mixture‑of‑Experts architecture with 1 trillion total parameters (32 billion active per token), a 256K‑token context window, and native INT4 quantization. The model is optimized for long‑horizon software problem solving, autonomous operation, and “agent swarm” orchestration, confidently competing with the best closed models in these areas. K2.6 can carry out complex engineering tasks for hours, turn visual mock‑ups into production‑ready web applications, and decompose and coordinate up to 300 parallel sub‑agents within a single session — making it one of the finest open solutions for research tasks and an effective intelligent core for a wide range of high‑tech products.
An open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is based on the Diffusion Transformer (DiT) architecture and incorporates additional components to enhance text processing and the handling of structured tasks.
MiniMax-M2.7 is the first model to have participated in its own evolution: during the development process, it built its own skills and optimized its own training. The architecture, based on a 230B MoE (10B active parameters) with full attention, ensures consistently high quality in complex agentic and office tasks. On benchmarks, the model demonstrates results on par with the best closed-source solutions. It is ideally suited for developing autonomous agents, working with office documents, and comprehensive automation of complex professional tasks, acting as an "omniscient and empathetic AI colleague."
An open text-to-image generation model developed by the ERNIE-Image team at Baidu. It is based on the Diffusion Transformer (DiT) architecture and incorporates additional components to enhance text processing and the handling of structured tasks.
GLM‑5.1 is a flagship MoE model (744B total / 40B active parameters) featuring DSA sparse attention, built for sustained autonomous operation. At the time of its release, it holds the top position on SWE‑Bench Pro and CyberGym, outperforming all existing models (including closed-source ones), and consistently ranks among the leaders in other significant benchmarks. Crucially, it maintains the ability to make progress across hundreds of iterations and thousands of tool calls—where many models lose effectiveness and try to give a quick answer, GLM‑5.1 continues to search for the optimal solution.
The flagship instruct model of the GigaChat family, built on a Mixture‑of‑Experts (MoE) architecture with 702 billion total and 36 billion active parameters. Combining Multi‑head Latent Attention (MLA), Multi‑Token Prediction (MTP) and native FP8 training delivers record‑breaking performance on long contexts while drastically reducing memory consumption. The model outperforms open‑source peers such as DeepSeek‑V3‑0324 and Qwen3‑235B‑A22B on a number of benchmarks and is released under the MIT license, making it suitable for commercial use.
GigaChat 3.1 Lightning is a compact Mixture-of-Experts model with 1.8 billion active parameters out of 10 billion total, built on MLA attention and supporting MTP, which combined with native FP8 training delivers excellent speed and quality. The model holds leading positions in its class and is one of the best solutions for fast conversational AI assistants, as well as for running simple yet reliable agent systems with tool calling and other functionalities.
A highly efficient mixture‑of‑experts model that, activating only 3.8B parameters, delivers 97% of the quality of the flagship 31B model. The optimal choice for complex agentic and analytical tasks with moderate computational requirements.
The flagship dense model of the Gemma‑4 family, with 31B parameters it only slightly trails the largest proprietary and open‑source alternatives. Native multimodality, multilingual support, a 256K token context window, a hybrid sliding window attention mechanism to reduce memory requirements, and overall – an ideal choice for tasks demanding high‑quality reasoning and in‑depth analysis.
The NVIDIA Nemotron 3 Super 120B (12B active) is a hybrid model based on a sparse Latent Mixture-of-Experts (MoE) and Mamba-2 architecture, optimized for building complex agentic systems and handling contexts of up to 1 million tokens. Thanks to its innovative architecture, which activates only 12 billion parameters per token, and its Multi-Token Prediction (MTP) mechanism, the model delivers high inference efficiency, combining response quality with performance and computational savings when processing long sequences.
This is an updated version of the LTX-2 model, developed by Lightricks for synchronized video and audio generation within a single model. It is based on the DiT architecture and integrates key components of modern video generation systems. The model delivers improved audio and visual quality, as well as increased text prompt accuracy.
The most compact model in the gemma-4 lineup, with an effective size of 2.3B parameters, full support for text, images, and audio. An ideal solution for agentic workflows on local and edge devices.