How Cels reduced GPU infrastructure costs by 2.5x by migrating AI staging environments to immers.cloud: A case study on how hosting cloud servers with on-demand configurations solved GPU shortages and enabled 24/7 development.
Cels develops a wide range of AI solutions for healthcare — from medical image analysis to automation of clinical documentation and data processing. In this case study, we focus on infrastructure optimization for two active projects that drove the migration:
Prior to migration, the team hosted their staging environments and test instances in a major public cloud. Over time, they encountered a classic ML development challenge: sharp price increases for GPU servers and a lack of configurations with 16–24 GB VRAM — ideal for staging and debugging.
To stay within budget, the team had to shut down machines overnight and on weekends. But as workload grew and GPU availability at the provider dwindled, even this workaround stopped working. Development of new versions, calibration tests under the Moscow Experiment,and LLM hypothesis validation began depending not on business priorities, but on pricing windows and resource availability.
Instead of purchasing dedicated hardware or overpaying for oversized cloud instances, Cels’ CTO decided to host GPU configurations from immers.cloud. The prepaid model and transparent pricing enabled the team to quickly deploy staging environments without lengthy approvals or hidden markups.
Infrastructure setup was handled internally by the DevOps engineer, and communication with our team was limited to operational questions about tariffs. Migration went smoothly: pipelines remained intact, and integration with the production backend in Yandex Cloud stayed seamless.
Project 1: 3D CV Model Inference (Chest CT)
Project 2: LLM Inference (AI Scribe)
Both projects run in isolated staging environments but can easily scale when transitioning to training or production deployment.
| Metric | Before Migration | With immers.cloud |
|---|---|---|
|
Staging infrastructure cost |
Base rate + surcharges for idle time/scaling |
Reduced by 2–2.5 times |
|
Resource availability |
Limited to night/weekend windows |
24/7, no restrictions |
|
Speed of deploying new hypotheses |
Dependent on quotas and GPU availability |
Nodes launched on demand in minutes |
|
Administration |
Manual tariff management, constant limit monitoring |
Handled by internal DevOps; immers.cloud support responds during business hours |
The team gained a predictable budget, the ability to run tests at any time, and the flexibility to spin up additional machines for training or new projects.
Client quote:
Migrating our AI development environments to immers.cloud significantly reduced our infrastructure costs while making resources available 24/7. This allows our ML teams to confidently run any tests, validate new model versions, and develop new AI projects and hypotheses.
The Cels case confirms: staging, pipeline debugging, and hypothesis validation often do not require enterprise-grade instances based on data center GPUs—in many scenarios, such configurations offer no meaningful advantage for inference and R&D workloads.
In practice, these tasks are efficiently handled by widely available GPUs like the NVIDIA RTX 3090 and RTX 4090, which feature 24 GB of VRAM and sufficient performance for both CV and LLM inference. Moreover, their high availability in our cloud enables rapid scaling.
If your team works on research tasks, computer vision calibration, or LLM inference, these GPUs allow you to launch experiments quickly and scale workloads without delays.