A-vibe

A-vibe is a Russian-language large language model developed by Avito, built upon the open-source Qwen3-8B-Base. Its key innovation lies in a unique approach to Russian language adaptation: rather than merely fine-tuning the model, the developers completely replaced the tokenizer, merging English tokens from the original Qwen3 with Russian tokens from a specially trained tokenizer. This hybrid approach achieves high tokenization efficiency for Russian text—using on average 22% fewer tokens for the same content—significantly accelerating inference and reducing the model size to 7.9 billion parameters. As a result, A-vibe processes Russian-language queries 15–25% faster than the base version.

Technically, A-vibe’s training pipeline comprised several critical stages: first, tokenizer adaptation on a corpus of 150 billion tokens (31% Russian and 31% English); next, supervised fine-tuning (SFT) on over 800,000 examples, including synthetic dialogues with function calling. This was followed by GRPO (Generalized Reinforcement Learning with Policy Optimization) to enhance mathematical reasoning and function-calling capabilities, and DPO (Direct Preference Optimization) to improve dialogue safety and quality. Special attention was paid to partially freezing embeddings during tokenizer adaptation—an innovative gradient-hooking technique that preserved the quality of representations for English tokens.

A-vibe demonstrates outstanding performance on Russian-language benchmarks: it outperforms the base Qwen3-8B on math_500_ru (68.6% vs. 54.6%). On the BFCL V3 function-calling benchmark, the model achieves 58.63%, confirming its strong function-calling capabilities. Most impressively, in the RU_ARENA ranking, A-vibe surpasses not only Qwen3-8B but also other Russian-language models significantly larger in size.

Use cases for A-vibe naturally stem from its architecture and strengths. It is ideally suited for building intelligent Russian-language chatbots and assistants, analyzing and summarizing text (including user inquiries and documents), generating and explaining code, and solving logical and computational tasks within educational, analytical, and service-oriented products.


Announce Date: 20.10.2025
Parameters: 7.9B
Context: 32K
Layers: 36
Attention Type: Full Attention
VRAM requirements: 10.7 GB using 4 bits quantization
Developer: AvitoTech
Transformers Version: 4.52.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore A-vibe capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU TPS Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended configurations for hosting A-vibe

Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslat4-1.16.16.160
32,768.0
16 16384 160 1 $0.33 Launch
teslaa2-1.16.32.160
32,768.0
16 32768 160 1 $0.38 Launch
teslaa10-1.16.32.160
32,768.0
16 32768 160 1 $0.53 Launch
rtx2080ti-2.12.64.160
32,768.0
tensor
12 65536 160 2 $0.69 Launch
rtx3090-1.16.24.160
32,768.0
16 24576 160 1 $0.88 Launch
rtx3080-2.16.32.160
32,768.0
tensor
16 32762 160 2 $0.97 Launch
rtx4090-1.16.32.160
32,768.0
16 32768 160 1 $1.15 Launch
teslav100-1.12.64.160
32,768.0
12 65536 160 1 $1.20 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
16 65536 160 2 $1.23 Launch
rtx5090-1.16.64.160
32,768.0
16 65536 160 1 $1.59 Launch
teslaa100-1.16.64.160
32,768.0
16 65536 160 1 $2.37 Launch
teslah100-1.16.64.160
32,768.0
16 65536 160 1 $3.83 Launch
h100nvl-1.16.96.160
32,768.0
16 98304 160 1 $4.11 Launch
h200-1.16.128.160
32,768.0
16 131072 160 1 $4.74 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslat4-1.16.16.160
32,768.0
16 16384 160 1 $0.33 Launch
teslaa2-1.16.32.160
32,768.0
16 32768 160 1 $0.38 Launch
teslaa10-1.16.32.160
32,768.0
16 32768 160 1 $0.53 Launch
rtx2080ti-2.12.64.160
32,768.0
tensor
12 65536 160 2 $0.69 Launch
rtx3090-1.16.24.160
32,768.0
16 24576 160 1 $0.88 Launch
rtx3080-2.16.32.160
32,768.0
tensor
16 32762 160 2 $0.97 Launch
rtx4090-1.16.32.160
32,768.0
16 32768 160 1 $1.15 Launch
teslav100-1.12.64.160
32,768.0
12 65536 160 1 $1.20 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
16 65536 160 2 $1.23 Launch
rtx5090-1.16.64.160
32,768.0
16 65536 160 1 $1.59 Launch
teslaa100-1.16.64.160
32,768.0
16 65536 160 1 $2.37 Launch
teslah100-1.16.64.160
32,768.0
16 65536 160 1 $3.83 Launch
h100nvl-1.16.96.160
32,768.0
16 98304 160 1 $4.11 Launch
h200-1.16.128.160
32,768.0
16 131072 160 1 $4.74 Launch
Prices:
Name vCPU RAM, MB Disk, GB GPU Price, hour
teslat4-2.16.32.160
32,768.0
tensor
16 32768 160 2 $0.54 Launch
teslaa2-2.16.32.160
32,768.0
tensor
16 32768 160 2 $0.57 Launch
rtx2080ti-3.12.24.120
32,768.0
pipeline
12 24576 120 3 $0.84 Launch
teslaa10-2.16.64.160
32,768.0
tensor
16 65536 160 2 $0.93 Launch
rtx2080ti-4.16.32.160
32,768.0
tensor
16 32768 160 4 $1.12 Launch
teslav100-1.12.64.160
32,768.0
12 65536 160 1 $1.20 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
16 65536 160 2 $1.23 Launch
rtx3080-3.16.64.160
32,768.0
pipeline
16 65536 160 3 $1.43 Launch
rtx5090-1.16.64.160
32,768.0
16 65536 160 1 $1.59 Launch
rtx3090-2.16.64.160
32,768.0
tensor
16 65536 160 2 $1.67 Launch
rtx3080-4.16.64.160
32,768.0
tensor
16 65536 160 4 $1.82 Launch
rtx4090-2.16.64.160
32,768.0
tensor
16 65536 160 2 $2.19 Launch
teslaa100-1.16.64.160
32,768.0
16 65536 160 1 $2.37 Launch
teslah100-1.16.64.160
32,768.0
16 65536 160 1 $3.83 Launch
h100nvl-1.16.96.160
32,768.0
16 98304 160 1 $4.11 Launch
h200-1.16.128.160
32,768.0
16 131072 160 1 $4.74 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.