A foundation model designed for Image-to-Video-Audio (IT2VA) and Text-to-Video-Audio (T2VA) tasks, enabling simultaneous generation of high-fidelity video and synchronized audio. It addresses limitations of cascaded pipelines and proprietary systems by providing a fully open-source solution.
Key Features:
The model is a component of the video generation pipeline, consisting of:
Total: ~38.8B parameters
For local running, authors recommends using at least 24GB GPU to generate a 8-second video at360p resolution (with offloading).
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | Generation time, sec. | |||
|---|---|---|---|---|---|
| 1 | $0.53 | Launch | |||
| 1 | $0.84 | Launch | |||
| 1 | $1.02 | Launch | |||
| 1 | $1.20 | Launch | |||
| 1 | $1.59 | Launch | |||
| 1 | $2.37 | Launch | |||
| 1 | $3.83 | Launch | |||
| 1 | $4.11 | Launch | |||
| 1 | $4.74 | Launch | |||
| Name | GPU | Generation time, sec. | |||
|---|---|---|---|---|---|
| 1 | $0.33 | Launch | |||
| 1 | $0.38 | Launch | |||
| 1 | $0.53 | Launch | |||
| 1 | $0.83 | Launch | |||
| 1 | $1.02 | Launch | |||
| 1 | $1.20 | Launch | |||
| 1 | $1.59 | Launch | |||
| 1 | $2.37 | Launch | |||
| 1 | $3.83 | Launch | |||
| 1 | $4.11 | Launch | |||
| 1 | $4.74 | Launch | |||
| Name | GPU | Generation time, sec. | |||
|---|---|---|---|---|---|
| 1 | $0.33 | Launch | |||
| 1 | $0.38 | Launch | |||
| 1 | $0.38 | Launch | |||
| 1 | $0.53 | Launch | |||
| 1 | $0.57 | Launch | |||
| 1 | $0.83 | Launch | |||
| 1 | $1.02 | Launch | |||
| 1 | $1.20 | Launch | |||
| 1 | $1.59 | Launch | |||
| 1 | $2.37 | Launch | |||
| 1 | $3.83 | Launch | |||
| 1 | $4.11 | Launch | |||
| 1 | $4.74 | Launch | |||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.