Products

Cloud servers

Cloud platform with the latest GPUs, fast onboarding, per‑second billing, and immersion cooling. Isolated resources ensure maximum performance for your project..

GPU servers

Cloud servers with modern RTX and Tesla graphics accelerators for games, rendering, streaming, working with 3D graphics, and artificial intelligence.

H200

H100 NVL

H100

RTX 5090

RTX 4090

RTX 3090

RTX 3080

A100

RTX A5000

A10

A2

RTX 2080 Ti^EOL

Tesla T4^EOL

Tesla V100^EOL

All GPU servers

CPU servers

The cloud servers with high-performance Intel Xeon Gold 2nd, 3rd and 5th generation CPU are available for 100% of the processor time.
SSD serversдо 75К IOPS NVMe serversдо 360К IOPS
All CPU servers

Immers Foundation Models

The largest catalog of vetted open‑source models with automatic configuration selection and tuning for rapid deployment. Launch private endpoints with no token fees, or use public endpoints.

GLM-5.2 Kimi-K2.7-Code NVIDIA-Nemotron-3-Ultra-550B-A55B gemma-4-12B-it MiniMax-M3 DeepSeek-V4-Pro DeepSeek-V4-Flash Qwen3.6-27B Qwen3.6-35B-A3B Kimi-K2.6 GLM-5.1 gemma-4-26B-A4B-it gemma-4-31B-it NVIDIA-Nemotron-3-Super-120B-A12B Qwen3.5-122B-A10B Qwen3.5-397B-A17B gpt-oss-120b gpt-oss-20b

All modelsfrom catalogue

Dedicated servers

Rent a physically dedicated server for a long term with a monthly payment. Configure it using modern components: Intel Xeon Gold 2nd to 5th generation processors and up to 8192 GB of RAM per server, SSD and NVMe disks for data centers.

Select a dedicated serverдо 10 GPU и 2.5M IOPS

Marketplace

Use popular and modern applications as effective tools for organizing your project. Save time with pre-configured images that already have all the necessary components installed.

Forget about manually downloading and installing the software — just deploy a virtual server with a ready-made image.
Neural networks 3D CUDA Docker / NGC For games Windows images Linux images
All pre-installed images
Features
Prices
FAQ
Contact
Login

How IBS Built a Unified Platform for AI R&D in Immers.cloud

Allocating a separate server for each hypothesis test is slow, expensive, and inefficient. The IBS team encountered this challenge while developing new AI scenarios, and found an elegant solution: a unified, managed platform for AI R&D.

In this case study, we break down in detail how to build a flexible GPU sandbox without workarounds. You'll learn how to organize centralized instance deployment via GPUStack, ensure predictable operations, and configure a hybrid network architecture that satisfies both security requirements and external integration needs.

If your team spends more time configuring hardware than running actual experiments, this case study will show you the path to a platform-based model for working with GPU infrastructure.

Client: IBS — IT service company
Project: Centralized platform for AI R&D and hypothesis testing
Stack: GPUStack, vLLM, mixed workload (LLM / NLP / embeddings / reranking)
Result: Accelerated PoC launches, a unified managed AI environment, predictable operations
Task: A sandbox for AI experiments without infrastructure workarounds

The IBS team specializes in research and development in artificial intelligence. Before launching any new scenario — whether fine-tuning a model, testing a pipeline, or building a service prototype — they need to quickly validate the hypothesis on real hardware.

Before:

No available sandbox with sufficient GPU power for experimentation;
Prolonged infrastructure setup phase for each new PoC;
Need to allocate a separate server for every task—slow, expensive, inefficient;

"We needed a managed GPU environment that could be quickly used for production APIs and AI PoCs without building a large proprietary GPU cluster from scratch," notes Alexander Zhukovsky, company representative.

Why immers.cloud — a GPU Cloud Service?

When selecting a provider, the IBS team defined clear criteria:

Stable GPU resources without unexpected outages or performance drops;
Private environment with VPN—ability to isolate the R&D environment from public access;
Configuration flexibility—support for different GPU types across various model classes;
Predictable operations—transparent monitoring, careful maintenance, minimal unplanned interventions;
Customer-centric approach—fast communication and willingness to tackle non-standard tasks.

immers.cloud addressed all requirements: the platform enabled deployment of a unified environment for mixed AI workloads with fine-grained network access control.

Technical Implementation Stack and Architecture

Stack and Architecture

Component	Solution
Orchestration	GPUStack — for managing models and instances
Inference	vLLM — for high-performance LLM serving
Workload Types	LLM, NLP, embeddings, reranking, VLM, applied AI PoCs
Configuration	1 control plane + 4 GPU worker nodes: 2 × A100 80GB 2 × (4 × RTX 3090 24GB) 1 × RTX 4090 24GB
Models in Environment	As of April 2026: 14 registered models, 11 active instances
Data Storage	Local, within the private network
Access	Corporate access via VPN + restricted public proxy for approved use cases
Monitoring	Centralized metrics collection, alerting, and logging

Network Architecture: Balancing Security and Accessibility

One of the key challenges was to combine two access scenarios:

Internal secured environment — for R&D teams, working with sensitive data, and debugging;
Limited public access — for client demonstrations and testing external integrations.

Solution

A hybrid scheme was implemented: the main AI environment is accessible only via VPN, while for specific approved use cases, an isolated public proxy with strict routing rules and rate limits has been configured.

How It Works in Practice

A new hypothesis enters the R&D backlog;
Instead of requesting a new server, an engineer deploys a model instance in the existing environment via GPUStack;
If needed, a public proxy is enabled for external testing;
All instances and metrics are visible in a unified monitoring dashboard;
Once the experiment concludes, resources are released and returned to the pool.
Challenges and How We Solved Them/Challenges and How We Solved Them

Challenge	Solution
Network connectivity: Needed to ensure both secure internal access and limited external access	Split architecture: VPN environment for R&D + isolated public proxy for approved use cases
Operations and predictability: Critical to avoid downtime and resource conflicts	Implemented operational policies, separation of critical and test workloads, centralized monitoring
Rapid deployment of new scenarios: Previously, each use case required a dedicated server	New scenarios now deploy on top of the existing environment—without allocating new hardware

"Collaboration with immers.cloud enabled us to transition to a unified platform-based model for working with GPU infrastructure," shares the project team.

Results: What Changed After Implementation

Collaboration with the immers.cloud GPU cloud enabled IBS to transition from ad-hoc solutions to a unified platform-based model for working with GPU infrastructure.

A unified R&D AI API environment emerged—all experiments now run in a single managed environment;
PoC launches accelerated—new scenarios are connected in hours, not days;
The need to allocate a separate GPU server for each use case disappeared—resources are now used efficiently, following a pooled model;
Onboarding new teams became simpler—just grant access to the environment, no need to configure infrastructure from scratch;
Infrastructure became observable and predictable—monitoring, logging, and clear operational policies reduced operational risks.

What's Next?

The platform continues to evolve: the IBS team is scaling the number of supported models, testing new multimodal inference scenarios, and planning to expand access for internal product teams.

For us at immers.cloud, this case study confirms that flexible, customer-centric infrastructure becomes a catalyst for innovation. When researchers don't spend time configuring servers and can jump straight into experimentation — everyone wins.

Want to build a similar R&D platform for AI experiments? Our engineers will help you design an environment tailored to your workload—from a single GPU to a distributed cluster.

Contact the immers.cloud team

Updated Date 03.06.2026