an audio-visual base model based on the DiT architecture, developed for synchronized generation of video and audio within a single model. It incorporates key components of modern video generation systems, including open weights and optimization for local use.
Key Features:
The model is a component of the video generation pipeline, consisting of:
Total: ~34B parameters
For local running, NVIDIA recommends using 24GB+ GPU to generate a 4-second video at 720p24 resolution (with 20 steps).
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | Generation time, sec. | |||
|---|---|---|---|---|---|
| 1 | $0.53 | Launch | |||
| 1 | $0.84 | Launch | |||
| 1 | $1.02 | Launch | |||
| 1 | $1.20 | Launch | |||
| 1 | $1.59 | Launch | |||
| 1 | $2.37 | Launch | |||
| 1 | $3.83 | Launch | |||
| 1 | $4.11 | Launch | |||
| 1 | $4.74 | Launch | |||
| Name | GPU | Generation time, sec. | |||
|---|---|---|---|---|---|
| 1 | $0.33 | Launch | |||
| 1 | $0.38 | Launch | |||
| 1 | $0.53 | Launch | |||
| 1 | $0.83 | Launch | |||
| 1 | $1.02 | Launch | |||
| 1 | $1.20 | Launch | |||
| 1 | $1.59 | Launch | |||
| 1 | $2.37 | Launch | |||
| 1 | $3.83 | Launch | |||
| 1 | $4.11 | Launch | |||
| 1 | $4.74 | Launch | |||
| Name | GPU | Generation time, sec. | |||
|---|---|---|---|---|---|
| 1 | $0.33 | Launch | |||
| 1 | $0.38 | Launch | |||
| 1 | $0.38 | Launch | |||
| 1 | $0.53 | Launch | |||
| 1 | $0.57 | Launch | |||
| 1 | $0.83 | Launch | |||
| 1 | $1.02 | Launch | |||
| 1 | $1.20 | Launch | |||
| 1 | $1.59 | Launch | |||
| 1 | $2.37 | Launch | |||
| 1 | $3.83 | Launch | |||
| 1 | $4.11 | Launch | |||
| 1 | $4.74 | Launch | |||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.