Why does a private endpoint take so long to create?
Creating an endpoint involves multiple stages:
- Planning and provisioning the server;
- Downloading vLLM and launching the container;
- Downloading model weights [1];
- Loading weights into GPU memory;
- Computing CUDA graphs;
- Verifying system routes /health and /v1/models;
- Configuring the load balancer.
[1] Weight downloads depend on the load of foreign services and network connectivity, and currently may take longer than usual.
That's why endpoint creation takes more time than provisioning a regular server without preinstalled software.
Updated Date 18.06.2026