Why does a private endpoint take so long to create?

Creating an endpoint involves multiple stages:

  • Planning and provisioning the server;
  • Downloading vLLM and launching the container;
  • Downloading model weights [1];
  • Loading weights into GPU memory;
  • Computing CUDA graphs;
  • Verifying system routes /health and /v1/models;
  • Configuring the load balancer.

[1] Weight downloads depend on the load of foreign services and network connectivity, and currently may take longer than usual.

That's why endpoint creation takes more time than provisioning a regular server without preinstalled software.
 

Updated Date 18.06.2026