How IBS Built a Unified Platform for AI R&D in Immers.cloud

Allocating a separate server for each hypothesis test is slow, expensive, and inefficient. The IBS team encountered this challenge while developing new AI scenarios, and found an elegant solution: a unified, managed platform for AI R&D.

In this case study, we break down in detail how to build a flexible GPU sandbox without workarounds. You'll learn how to organize centralized instance deployment via GPUStack, ensure predictable operations, and configure a hybrid network architecture that satisfies both security requirements and external integration needs.

If your team spends more time configuring hardware than running actual experiments, this case study will show you the path to a platform-based model for working with GPU infrastructure.

  • Client: IBS — IT service company
  • Project: Centralized platform for AI R&D and hypothesis testing
  • Stack: GPUStack, vLLM, mixed workload (LLM / NLP / embeddings / reranking)
  • Result: Accelerated PoC launches, a unified managed AI environment, predictable operations
  • Task: A sandbox for AI experiments without infrastructure workarounds

The IBS team specializes in research and development in artificial intelligence. Before launching any new scenario — whether fine-tuning a model, testing a pipeline, or building a service prototype — they need to quickly validate the hypothesis on real hardware.

Before:

  • No available sandbox with sufficient GPU power for experimentation;
  • Prolonged infrastructure setup phase for each new PoC;
  • Need to allocate a separate server for every task—slow, expensive, inefficient;

"We needed a managed GPU environment that could be quickly used for production APIs and AI PoCs without building a large proprietary GPU cluster from scratch," notes Alexander Zhukovsky, company representative.

Why immers.cloud — a GPU Cloud Service?

When selecting a provider, the IBS team defined clear criteria:

  • Stable GPU resources without unexpected outages or performance drops;
  • Private environment with VPN—ability to isolate the R&D environment from public access;
  • Configuration flexibility—support for different GPU types across various model classes;
  • Predictable operations—transparent monitoring, careful maintenance, minimal unplanned interventions;
  • Customer-centric approach—fast communication and willingness to tackle non-standard tasks.

immers.cloud addressed all requirements: the platform enabled deployment of a unified environment for mixed AI workloads with fine-grained network access control.

Technical Implementation Stack and Architecture

Stack and Architecture

Component Solution

Orchestration

GPUStack — for managing models and instances

Inference

vLLM — for high-performance LLM serving

Workload Types

LLM, NLP, embeddings, reranking, VLM, applied AI PoCs

Configuration

1 control plane + 4 GPU worker nodes:

2 × A100 80GB

2 × (4 × RTX 3090 24GB)

1 × RTX 4090 24GB

Models in Environment

As of April 2026: 14 registered models, 11 active instances

Data Storage

Local, within the private network

Access

Corporate access via VPN + restricted public proxy for approved use cases

Monitoring

Centralized metrics collection, alerting, and logging

Network Architecture: Balancing Security and Accessibility

One of the key challenges was to combine two access scenarios:

  1. Internal secured environment — for R&D teams, working with sensitive data, and debugging;
  2. Limited public access — for client demonstrations and testing external integrations.

Solution

A hybrid scheme was implemented: the main AI environment is accessible only via VPN, while for specific approved use cases, an isolated public proxy with strict routing rules and rate limits has been configured.

How It Works in Practice

  • A new hypothesis enters the R&D backlog;
  • Instead of requesting a new server, an engineer deploys a model instance in the existing environment via GPUStack;
  • If needed, a public proxy is enabled for external testing;
  • All instances and metrics are visible in a unified monitoring dashboard;
  • Once the experiment concludes, resources are released and returned to the pool.
  • Challenges and How We Solved Them/Challenges and How We Solved Them

Challenge

Solution

Network connectivity: Needed to ensure both secure internal access and limited external access

Split architecture: VPN environment for R&D + isolated public proxy for approved use cases

Operations and predictability: Critical to avoid downtime and resource conflicts

Implemented operational policies, separation of critical and test workloads, centralized monitoring

Rapid deployment of new scenarios: Previously, each use case required a dedicated server

New scenarios now deploy on top of the existing environment—without allocating new hardware

"Collaboration with immers.cloud enabled us to transition to a unified platform-based model for working with GPU infrastructure," shares the project team.

Results: What Changed After Implementation

Collaboration with the immers.cloud GPU cloud enabled IBS to transition from ad-hoc solutions to a unified platform-based model for working with GPU infrastructure.

  • A unified R&D AI API environment emerged—all experiments now run in a single managed environment;
  • PoC launches accelerated—new scenarios are connected in hours, not days;
  • The need to allocate a separate GPU server for each use case disappeared—resources are now used efficiently, following a pooled model;
  • Onboarding new teams became simpler—just grant access to the environment, no need to configure infrastructure from scratch;
  • Infrastructure became observable and predictable—monitoring, logging, and clear operational policies reduced operational risks.

What's Next?

The platform continues to evolve: the IBS team is scaling the number of supported models, testing new multimodal inference scenarios, and planning to expand access for internal product teams.

For us at immers.cloud, this case study confirms that flexible, customer-centric infrastructure becomes a catalyst for innovation. When researchers don't spend time configuring servers and can jump straight into experimentation — everyone wins.

Want to build a similar R&D platform for AI experiments? Our engineers will help you design an environment tailored to your workload—from a single GPU to a distributed cluster.

Contact the immers.cloud team

Updated Date 03.06.2026