A-vibe is a Russian-language large language model developed by Avito, built upon the open-source Qwen3-8B-Base. Its key innovation lies in a unique approach to Russian language adaptation: rather than merely fine-tuning the model, the developers completely replaced the tokenizer, merging English tokens from the original Qwen3 with Russian tokens from a specially trained tokenizer. This hybrid approach achieves high tokenization efficiency for Russian text—using on average 22% fewer tokens for the same content—significantly accelerating inference and reducing the model size to 7.9 billion parameters. As a result, A-vibe processes Russian-language queries 15–25% faster than the base version.
Technically, A-vibe’s training pipeline comprised several critical stages: first, tokenizer adaptation on a corpus of 150 billion tokens (31% Russian and 31% English); next, supervised fine-tuning (SFT) on over 800,000 examples, including synthetic dialogues with function calling. This was followed by GRPO (Generalized Reinforcement Learning with Policy Optimization) to enhance mathematical reasoning and function-calling capabilities, and DPO (Direct Preference Optimization) to improve dialogue safety and quality. Special attention was paid to partially freezing embeddings during tokenizer adaptation—an innovative gradient-hooking technique that preserved the quality of representations for English tokens.
A-vibe demonstrates outstanding performance on Russian-language benchmarks: it outperforms the base Qwen3-8B on math_500_ru (68.6% vs. 54.6%). On the BFCL V3 function-calling benchmark, the model achieves 58.63%, confirming its strong function-calling capabilities. Most impressively, in the RU_ARENA ranking, A-vibe surpasses not only Qwen3-8B but also other Russian-language models significantly larger in size.
Use cases for A-vibe naturally stem from its architecture and strengths. It is ideally suited for building intelligent Russian-language chatbots and assistants, analyzing and summarizing text (including user inquiries and documents), generating and explaining code, and solving logical and computational tasks within educational, analytical, and service-oriented products.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.33 | 1.604 | Launch | ||
32,768.0 |
1 | $0.38 | 1.612 | Launch | ||
32,768.0 |
1 | $0.53 | 3.227 | Launch | ||
32,768.0 tensor |
2 | $0.69 | 2.389 | Launch | ||
32,768.0 |
1 | $0.83 | 3.444 | Launch | ||
32,768.0 tensor |
2 | $0.97 | 2.001 | Launch | ||
32,768.0 |
1 | $1.02 | 3.435 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 7.271 | Launch | ||
32,768.0 |
1 | $1.59 | 5.036 | Launch | ||
32,768.0 |
1 | $2.37 | 14.829 | Launch | ||
32,768.0 |
1 | $3.83 | 14.815 | Launch | ||
32,768.0 |
1 | $4.11 | 17.659 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 30.476 | Launch | ||
32,768.0 |
1 | $4.74 | 27.208 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 55.234 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.53 | 2.409 | Launch | ||
32,768.0 tensor |
2 | $0.54 | 3.207 | Launch | ||
32,768.0 tensor |
2 | $0.57 | 3.224 | Launch | ||
32,768.0 tensor |
2 | $0.69 | 1.572 | Launch | ||
32,768.0 |
1 | $0.83 | 2.626 | Launch | ||
32,768.0 tensor |
2 | $0.97 | 1.183 | Launch | ||
32,768.0 |
1 | $1.02 | 2.618 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 6.454 | Launch | ||
32,768.0 |
1 | $1.59 | 4.219 | Launch | ||
32,768.0 |
1 | $2.37 | 14.012 | Launch | ||
32,768.0 |
1 | $3.83 | 13.997 | Launch | ||
32,768.0 |
1 | $4.11 | 16.841 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 29.658 | Launch | ||
32,768.0 |
1 | $4.74 | 26.391 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 54.417 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 tensor |
2 | $0.54 | 1.573 | Launch | ||
32,768.0 tensor |
2 | $0.57 | 1.589 | Launch | ||
32,768.0 pipeline |
3 | $0.84 | 1.540 | Launch | ||
32,768.0 tensor |
2 | $0.93 | 4.819 | Launch | ||
32,768.0 tensor |
4 | $1.12 | 3.144 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 4.819 | Launch | ||
32,768.0 tensor |
2 | $1.56 | 5.253 | Launch | ||
32,768.0 |
1 | $1.59 | 2.584 | Launch | ||
32,768.0 tensor |
4 | $1.82 | 2.367 | Launch | ||
32,768.0 tensor |
2 | $1.92 | 5.236 | Launch | ||
32,768.0 |
1 | $2.37 | 85.720 | 12.377 | Launch | |
32,768.0 |
1 | $3.83 | 12.363 | Launch | ||
32,768.0 |
1 | $4.11 | 15.206 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 28.024 | Launch | ||
32,768.0 |
1 | $4.74 | 24.756 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 52.782 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.