Products

Cloud servers

Cloud servers with per-second billing. Isolated resources will give maximum performance for your project.

GPU servers

Cloud servers with modern RTX and Tesla graphics accelerators for games, rendering, streaming, working with 3D graphics, and artificial intelligence.

H200

H100 NVL

H100

RTX 5090

RTX 4090

RTX 3090

RTX 3080

A100

RTX A5000

A10

RTX 2080 Ti

A2

Tesla T4

Tesla V100

All GPU servers

CPU servers

The cloud servers with high-performance Intel Xeon Gold 2nd, 3rd and 5th generation CPU are available for 100% of the processor time.
SSD servers NVMe servers
All CPU servers

Dedicated servers

Rent a physically dedicated server for a long term with a monthly payment. Configure it using modern components: Intel Xeon Gold 2nd, 3rd and 5th generation processors, up to 10 of the latest RTX and Tesla video accelerators, and up to 8192 GB of RAM per server, SSD and NVMe disks for data centers.

Select a dedicated server

Marketplace

Use popular and modern applications as effective tools for organizing your project. Save time with pre-configured images that already have all the necessary components installed.

Forget about manually downloading and installing the software — just deploy a virtual server with a ready-made image.
Neural networks 3D CUDA Docker / NGC For games Windows images Linux images
All pre-installed images
Features
Prices
FAQ
Contact
Login

How to reduce GPU infrastructure costs by 2.5x and launch 24/7 AI development

How Cels reduced GPU infrastructure costs by 2.5x by migrating AI staging environments to immers.cloud: A case study on how hosting cloud servers with on-demand configurations solved GPU shortages and enabled 24/7 development.

Client: LLC “Medical Screening Systems” (brand Cels)
Industry: MedTech (AI in radiology, LLMs for healthcare)
Product: AI-powered medical image processing service + AI scribe for clinical consultations and analytics on medical databases

Problem: How cloud pricing and GPU shortages were slowing down development

Cels develops a wide range of AI solutions for healthcare — from medical image analysis to automation of clinical documentation and data processing. In this case study, we focus on infrastructure optimization for two active projects that drove the migration:

3D Computer Vision for detection and segmentation of pathologies in chest CT scans;
LLM-based projects for transcribing doctor–patient conversations (AI scribe) and analytics on structured medical data.

Prior to migration, the team hosted their staging environments and test instances in a major public cloud. Over time, they encountered a classic ML development challenge: sharp price increases for GPU servers and a lack of configurations with 16–24 GB VRAM — ideal for staging and debugging.

To stay within budget, the team had to shut down machines overnight and on weekends. But as workload grew and GPU availability at the provider dwindled, even this workaround stopped working. Development of new versions, calibration tests under the Moscow Experiment,and LLM hypothesis validation began depending not on business priorities, but on pricing windows and resource availability.

How moving infrastructure to immers.cloud solved the resource availability issue

Instead of purchasing dedicated hardware or overpaying for oversized cloud instances, Cels’ CTO decided to host GPU configurations from immers.cloud. The prepaid model and transparent pricing enabled the team to quickly deploy staging environments without lengthy approvals or hidden markups.

Infrastructure setup was handled internally by the DevOps engineer, and communication with our team was limited to operational questions about tariffs. Migration went smoothly: pipelines remained intact, and integration with the production backend in Yandex Cloud stayed seamless.

Technical implementation: Two projects, one platform

Project 1: 3D CV Model Inference (Chest CT)

Stack: PyTorch, Python, Redis
Orchestration: Kubernetes
Configuration: rtx3090-1.32.64.160 + dedicated CPU node for connectivity with the main cluster in Yandex Cloud
Workload: Staging inference (internal tests, external calibration runs)
Performance: Stable parallel processing of 1–2 studies (each containing 300–1,000 images), fully meeting pre-production needs
Storage: Local instance disks

Project 2: LLM Inference (AI Scribe)

Stack: vLLM
Orchestration: Docker Compose
Configuration: rtx3090-1.8.32.160 + A2-based GPU machine for Speech-to-Text model inference
Workload: Inference of a custom LLM based on Qwen 3, A/B testing of hypotheses, quality metrics collection
Performance: 3–5 parallel requests—optimal for research tasks and prompt/architecture validation
Storage: Local, with scheduled artifact synchronization

Both projects run in isolated staging environments but can easily scale when transitioning to training or production deployment.

Results in numbers

Metric	Before Migration	With immers.cloud
Staging infrastructure cost	Base rate + surcharges for idle time/scaling	Reduced by 2–2.5 times
Resource availability	Limited to night/weekend windows	24/7, no restrictions
Speed of deploying new hypotheses	Dependent on quotas and GPU availability	Nodes launched on demand in minutes
Administration	Manual tariff management, constant limit monitoring	Handled by internal DevOps; immers.cloud support responds during business hours

The team gained a predictable budget, the ability to run tests at any time, and the flexibility to spin up additional machines for training or new projects.

Client quote:

Migrating our AI development environments to immers.cloud significantly reduced our infrastructure costs while making resources available 24/7. This allows our ML teams to confidently run any tests, validate new model versions, and develop new AI projects and hypotheses.

Conclusion

The Cels case confirms: staging, pipeline debugging, and hypothesis validation often do not require enterprise-grade instances based on data center GPUs—in many scenarios, such configurations offer no meaningful advantage for inference and R&D workloads.

In practice, these tasks are efficiently handled by widely available GPUs like the NVIDIA RTX 3090 and RTX 4090, which feature 24 GB of VRAM and sufficient performance for both CV and LLM inference. Moreover, their high availability in our cloud enables rapid scaling.

If your team works on research tasks, computer vision calibration, or LLM inference, these GPUs allow you to launch experiments quickly and scale workloads without delays.

Updated Date 16.04.2026