It is a 5 billion-parameter text-to-video (TI2V) generative model designed for high-definition video generation at 720P resolution (1280×704 or 704×1280) with 24fps. Built using the Wan2.2-VAE architecture, it achieves a compression ratio of 16×16×4, enabling efficient deployment on consumer-grade GPUs like the RTX 4090. The model supports both text-to-video and image-to-video generation within a unified framework.
Key Features
The model is a component of the video generation pipeline, consisting of:
Total: ~11B parameters
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
There are no configurations for this model, context and quantization yet.
There are no configurations for this model, context and quantization yet.
There are no configurations for this model, context and quantization yet.
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.