Allocating a separate server for each hypothesis test is slow, expensive, and inefficient. The IBS team encountered this challenge while developing new AI scenarios, and found an elegant solution: a unified, managed platform for AI R&D.
In this case study, we break down in detail how to build a flexible GPU sandbox without workarounds. You'll learn how to organize centralized instance deployment via GPUStack, ensure predictable operations, and configure a hybrid network architecture that satisfies both security requirements and external integration needs.
If your team spends more time configuring hardware than running actual experiments, this case study will show you the path to a platform-based model for working with GPU infrastructure.
The IBS team specializes in research and development in artificial intelligence. Before launching any new scenario — whether fine-tuning a model, testing a pipeline, or building a service prototype — they need to quickly validate the hypothesis on real hardware.
"We needed a managed GPU environment that could be quickly used for production APIs and AI PoCs without building a large proprietary GPU cluster from scratch," notes Alexander Zhukovsky, company representative.
When selecting a provider, the IBS team defined clear criteria:
immers.cloud addressed all requirements: the platform enabled deployment of a unified environment for mixed AI workloads with fine-grained network access control.
Stack and Architecture
| Component | Solution |
|---|---|
|
Orchestration |
GPUStack — for managing models and instances |
|
Inference |
vLLM — for high-performance LLM serving |
|
Workload Types |
LLM, NLP, embeddings, reranking, VLM, applied AI PoCs |
|
Configuration |
1 control plane + 4 GPU worker nodes: 2 × A100 80GB 2 × (4 × RTX 3090 24GB) 1 × RTX 4090 24GB |
|
Models in Environment |
As of April 2026: 14 registered models, 11 active instances |
|
Data Storage |
Local, within the private network |
|
Access |
Corporate access via VPN + restricted public proxy for approved use cases |
|
Monitoring |
Centralized metrics collection, alerting, and logging |
One of the key challenges was to combine two access scenarios:
A hybrid scheme was implemented: the main AI environment is accessible only via VPN, while for specific approved use cases, an isolated public proxy with strict routing rules and rate limits has been configured.
|
Challenge |
Solution |
|---|---|
|
Network connectivity: Needed to ensure both secure internal access and limited external access |
Split architecture: VPN environment for R&D + isolated public proxy for approved use cases |
|
Operations and predictability: Critical to avoid downtime and resource conflicts |
Implemented operational policies, separation of critical and test workloads, centralized monitoring |
|
Rapid deployment of new scenarios: Previously, each use case required a dedicated server |
New scenarios now deploy on top of the existing environment—without allocating new hardware |
"Collaboration with immers.cloud enabled us to transition to a unified platform-based model for working with GPU infrastructure," shares the project team.
Collaboration with the immers.cloud GPU cloud enabled IBS to transition from ad-hoc solutions to a unified platform-based model for working with GPU infrastructure.
The platform continues to evolve: the IBS team is scaling the number of supported models, testing new multimodal inference scenarios, and planning to expand access for internal product teams.
For us at immers.cloud, this case study confirms that flexible, customer-centric infrastructure becomes a catalyst for innovation. When researchers don't spend time configuring servers and can jump straight into experimentation — everyone wins.
Want to build a similar R&D platform for AI experiments? Our engineers will help you design an environment tailored to your workload—from a single GPU to a distributed cluster.