mochi-1-preview

Mochi-1 is a state-of-the-art open-source text-to-video generation model developed by Genmo. It achieves high-fidelity motion and strong prompt adherence in preliminary evaluations, significantly narrowing the gap between closed and open video generation systems.

Core components:

  • Asymmetric Diffusion Transformer (AsymmDiT): A 10-billion-parameter diffusion model built on a novel architecture.
  • Asymmetric Variational Autoencoder (AsymmVAE): Encoding-decoding model with an asymmetric structure
  • Text Encoding: Uses a single T5-XXL language model for prompt encoding, avoiding reliance on multiple pretrained language models.

Key Features:

  • Single-GPU Setup requires ~60GB VRAM for full operation.Recommended at least one NVIDIA H100 GPU for optimal performance. Supports multi-GPU operation with context parallel implementation. Memory Optimization options include CPU offloading, VAE tiling, and lower precision (bfloat16) variants to reduce VRAM usage.
  • Current output resolution is 480p.
  • The model supports generation of videos with photorealistic styles. Minor warping/distortions may occur in videos with extreme motion. Performs poorly with animated content.
  • Organizations are advised to implement additional safety measures before deploying in commercial applications. 

The model is a component of the video generation pipeline, consisting of:

  • Text encoder: ~4.8B parameters,
  • Transformer: ~10B parameters,
  • VAE: ~460M parameters.

Total: ~10.5B parameters


Announce Date: 22.10.2024
Parameters: 10M
Developer: genmo
Diffusers Version: 0.32.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore mochi-1-preview capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting mochi-1-preview

Prices:
Name GPU Price, hour Generation time, sec.
teslav100-1.12.64.160 1 $1.20 Launch
rtx5090-1.16.64.160 1 $1.59 Launch
teslaa100-1.16.64.160 1 $2.37 Launch
h100-1.16.64.160 1 $3.83 Launch
h100nvl-1.16.96.160 1 $4.11 Launch
h200-1.16.128.160 1 $4.74 Launch
Prices:
Name GPU Price, hour Generation time, sec.
teslat4-1.16.16.160 1 $0.33 Launch
teslaa2-1.16.32.160 1 $0.38 Launch
teslaa10-1.16.32.160 1 $0.53 Launch
rtx3090-1.16.24.160 1 $0.83 Launch
rtx4090-1.16.32.160 1 $1.02 Launch
teslav100-1.12.64.160 1 $1.20 Launch
rtx5090-1.16.64.160 1 $1.59 Launch
teslaa100-1.16.64.160 1 $2.37 Launch
h100-1.16.64.160 1 $3.83 Launch
h100nvl-1.16.96.160 1 $4.11 Launch
h200-1.16.128.160 1 $4.74 Launch
Prices:
Name GPU Price, hour Generation time, sec.
teslat4-1.16.16.160 1 $0.33 Launch
rtx2080ti-1.10.16.500 1 $0.38 Launch
teslaa2-1.16.32.160 1 $0.38 Launch
teslaa10-1.16.32.160 1 $0.53 Launch
rtx3080-1.16.32.160 1 $0.57 Launch
rtx3090-1.16.24.160 1 $0.83 Launch
rtx4090-1.16.32.160 1 $1.02 Launch
teslav100-1.12.64.160 1 $1.20 Launch
rtx5090-1.16.64.160 1 $1.59 Launch
teslaa100-1.16.64.160 1 $2.37 Launch
h100-1.16.64.160 1 $3.83 Launch
h100nvl-1.16.96.160 1 $4.11 Launch
h200-1.16.128.160 1 $4.74 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.