stable-video-diffusion-img2vid-xt

This is a diffusion model developed by Stability AI for generating short video clips from a static image (image-to-video). The model creates videos up to 4 seconds long (25 frames at 576×1024 resolution), using the input image as a conditional frame. It was trained on large datasets and finetuned based on the previous version SVD Image-to-Video [14 frames], using an f8 decoder to improve the temporal coherence of frames. Training required ~200,000 hours of computation on GPU A100 80GB, with CO₂ emissions ~19,000 kg and energy consumption ~64,000 kWh.

Limitations:

  • Short videos (≤25 frames), lack of full photorealism.
  • Static frames or slow panning may occur.
  • Poor quality when displaying text, faces, and people.
  • The model’s autoencoding process smooths out details.

Usage conditions: Prohibited to create materials violating Stability AI’s policy (e.g., illegal, offensive, or misleading content). The model is not intended for generating factual or historical representations.


Announce Date: 20.11.2023
Parameters: 0B
Developer: Stability AI
License: Stable Video Diffusion Community

Public endpoint

Use our pre-built public endpoints to test inference and explore stable-video-diffusion-img2vid-xt capabilities.
Model Name Context Type GPU TPS Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended configurations for hosting stable-video-diffusion-img2vid-xt

Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
rtx2080ti-1.16.32.160 16 32768 160 1 $0.41 Launch
teslat4-1.16.16.160 16 16384 160 1 $0.46 Launch
teslaa10-1.16.32.160 16 32768 160 1 $0.53 Launch
teslaa2-2.16.32.160 16 32768 160 2 $0.57 Launch
rtx3090-1.16.24.160 16 24576 160 1 $0.88 Launch
rtx4090-1.16.32.160 16 32768 160 1 $1.15 Launch
teslav100-1.12.64.160 12 65536 160 1 $1.20 Launch
rtx5090-1.16.64.160 16 65536 160 1 $1.59 Launch
teslaa100-1.16.64.160 16 65536 160 1 $2.58 Launch
teslah100-1.16.64.160 16 65536 160 1 $5.11 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
rtx2080ti-1.16.32.160 16 32768 160 1 $0.41 Launch
teslat4-1.16.16.160 16 16384 160 1 $0.46 Launch
teslaa10-1.16.32.160 16 32768 160 1 $0.53 Launch
teslaa2-2.16.32.160 16 32768 160 2 $0.57 Launch
rtx3090-1.16.24.160 16 24576 160 1 $0.88 Launch
rtx4090-1.16.32.160 16 32768 160 1 $1.15 Launch
teslav100-1.12.64.160 12 65536 160 1 $1.20 Launch
rtx5090-1.16.64.160 16 65536 160 1 $1.59 Launch
teslaa100-1.16.64.160 16 65536 160 1 $2.58 Launch
teslah100-1.16.64.160 16 65536 160 1 $5.11 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
rtx2080ti-1.16.32.160 16 32768 160 1 $0.41 Launch
teslat4-1.16.16.160 16 16384 160 1 $0.46 Launch
teslaa10-1.16.32.160 16 32768 160 1 $0.53 Launch
teslaa2-2.16.32.160 16 32768 160 2 $0.57 Launch
rtx3090-1.16.24.160 16 24576 160 1 $0.88 Launch
rtx4090-1.16.32.160 16 32768 160 1 $1.15 Launch
teslav100-1.12.64.160 12 65536 160 1 $1.20 Launch
rtx5090-1.16.64.160 16 65536 160 1 $1.59 Launch
teslaa100-1.16.64.160 16 65536 160 1 $2.58 Launch
teslah100-1.16.64.160 16 65536 160 1 $5.11 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.