LTX-2

an audio-visual base model based on the DiT architecture, developed for synchronized generation of video and audio within a single model. It incorporates key components of modern video generation systems, including open weights and optimization for local use.

Key Features:

  • Audiovisual content generation based on text, images, video, or audio.
  • Supported tasks: Image-to-Video, Text-to-Video, Video-to-Video, Audio-to-Video, Text-to-Audio, and others.
  • Built-in audio-video synchronization.
  • Limitations:
    • Not designed to provide factual information.
    • Frame size must be divisible by 32; number of frames must be divisible by 8+1.
    • Audio without speech may have low quality.
    • Prompt execution accuracy depends on the writing style. For details, see the Prompting guide.
    • The model is distributed with a warning about the potential generation of inappropriate or offensive content.

The model is a component of the video generation pipeline, consisting of:

  • audio VAE: ~153M parameters, 
  • connectors: ~1.4B parameters, 
  • latent upsampler: ~497M parameters, 
  • Text encoder: ~12B parameters,
  • Transformer: ~19B parameters,
  • VAE: ~1.2B parameters, 
  • vocoder: ~56M parameters

Total: ~34B parameters


For local running, NVIDIA recommends using 24GB+ GPU to generate a 4-second video at 720p24 resolution (with 20 steps).


Announce Date: 06.01.2026
Parameters: 19B
Developer: Lightricks
Diffusers Version: 0.37.0.dev0
License: LTX-2 Community License Agreement

Public endpoint

Use our pre-built public endpoints for free to test inference and explore LTX-2 capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting LTX-2

Prices:
Name GPU Price, hour Generation time, sec.
teslaa10-1.16.32.160 1 $0.53 Launch
rtx3090-1.16.32.160 1 $0.84 Launch
rtx4090-1.16.32.160 1 $1.02 Launch
teslav100-1.12.64.160 1 $1.20 Launch
rtx5090-1.16.64.160 1 $1.59 Launch
teslaa100-1.16.64.160 1 $2.37 Launch
h100-1.16.64.160 1 $3.83 Launch
h100nvl-1.16.96.160 1 $4.11 Launch
h200-1.16.128.160 1 $4.74 Launch
Prices:
Name GPU Price, hour Generation time, sec.
teslat4-1.16.16.160 1 $0.33 Launch
teslaa2-1.16.32.160 1 $0.38 Launch
teslaa10-1.16.32.160 1 $0.53 Launch
rtx3090-1.16.24.160 1 $0.83 Launch
rtx4090-1.16.32.160 1 $1.02 Launch
teslav100-1.12.64.160 1 $1.20 Launch
rtx5090-1.16.64.160 1 $1.59 Launch
teslaa100-1.16.64.160 1 $2.37 Launch
h100-1.16.64.160 1 $3.83 Launch
h100nvl-1.16.96.160 1 $4.11 Launch
h200-1.16.128.160 1 $4.74 Launch
Prices:
Name GPU Price, hour Generation time, sec.
teslat4-1.16.16.160 1 $0.33 Launch
rtx2080ti-1.10.16.500 1 $0.38 Launch
teslaa2-1.16.32.160 1 $0.38 Launch
teslaa10-1.16.32.160 1 $0.53 Launch
rtx3080-1.16.32.160 1 $0.57 Launch
rtx3090-1.16.24.160 1 $0.83 Launch
rtx4090-1.16.32.160 1 $1.02 Launch
teslav100-1.12.64.160 1 $1.20 Launch
rtx5090-1.16.64.160 1 $1.59 Launch
teslaa100-1.16.64.160 1 $2.37 Launch
h100-1.16.64.160 1 $3.83 Launch
h100nvl-1.16.96.160 1 $4.11 Launch
h200-1.16.128.160 1 $4.74 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.