A-vibe is a Russian-language large language model developed by Avito, built upon the open-source Qwen3-8B-Base. Its key innovation lies in a unique approach to Russian language adaptation: rather than merely fine-tuning the model, the developers completely replaced the tokenizer, merging English tokens from the original Qwen3 with Russian tokens from a specially trained tokenizer. This hybrid approach achieves high tokenization efficiency for Russian text—using on average 22% fewer tokens for the same content—significantly accelerating inference and reducing the model size to 7.9 billion parameters. As a result, A-vibe processes Russian-language queries 15–25% faster than the base version.
Technically, A-vibe’s training pipeline comprised several critical stages: first, tokenizer adaptation on a corpus of 150 billion tokens (31% Russian and 31% English); next, supervised fine-tuning (SFT) on over 800,000 examples, including synthetic dialogues with function calling. This was followed by GRPO (Generalized Reinforcement Learning with Policy Optimization) to enhance mathematical reasoning and function-calling capabilities, and DPO (Direct Preference Optimization) to improve dialogue safety and quality. Special attention was paid to partially freezing embeddings during tokenizer adaptation—an innovative gradient-hooking technique that preserved the quality of representations for English tokens.
A-vibe demonstrates outstanding performance on Russian-language benchmarks: it outperforms the base Qwen3-8B on math_500_ru (68.6% vs. 54.6%). On the BFCL V3 function-calling benchmark, the model achieves 58.63%, confirming its strong function-calling capabilities. Most impressively, in the RU_ARENA ranking, A-vibe surpasses not only Qwen3-8B but also other Russian-language models significantly larger in size.
Use cases for A-vibe naturally stem from its architecture and strengths. It is ideally suited for building intelligent Russian-language chatbots and assistants, analyzing and summarizing text (including user inquiries and documents), generating and explaining code, and solving logical and computational tasks within educational, analytical, and service-oriented products.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.33 | 1.827 | Launch | ||
32,768.0 |
1 | $0.38 | 1.827 | Launch | ||
32,768.0 |
1 | $0.53 | 3.427 | Launch | ||
32,768.0 tensor |
2 | $0.69 | 2.471 | Launch | ||
32,768.0 |
1 | $0.83 | 3.427 | Launch | ||
32,768.0 tensor |
2 | $0.97 | 2.071 | Launch | ||
32,768.0 |
1 | $1.02 | 3.427 | Launch | ||
32,768.0 |
1 | $1.20 | 5.027 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 7.671 | Launch | ||
32,768.0 |
1 | $1.59 | 5.027 | Launch | ||
32,768.0 |
1 | $2.37 | 14.627 | Launch | ||
32,768.0 |
1 | $3.83 | 14.627 | Launch | ||
32,768.0 |
1 | $4.11 | 17.427 | Launch | ||
32,768.0 |
1 | $4.74 | 26.827 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.33 | 1.009 | Launch | ||
32,768.0 |
1 | $0.38 | 1.009 | Launch | ||
32,768.0 |
1 | $0.53 | 2.609 | Launch | ||
32,768.0 tensor |
2 | $0.69 | 1.654 | Launch | ||
32,768.0 |
1 | $0.83 | 2.609 | Launch | ||
32,768.0 tensor |
2 | $0.97 | 1.254 | Launch | ||
32,768.0 |
1 | $1.02 | 2.609 | Launch | ||
32,768.0 |
1 | $1.20 | 4.209 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 6.854 | Launch | ||
32,768.0 |
1 | $1.59 | 4.209 | Launch | ||
32,768.0 |
1 | $2.37 | 13.809 | Launch | ||
32,768.0 |
1 | $3.83 | 13.809 | Launch | ||
32,768.0 |
1 | $4.11 | 16.609 | Launch | ||
32,768.0 |
1 | $4.74 | 26.009 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 tensor |
2 | $0.54 | 1.778 | Launch | ||
32,768.0 tensor |
2 | $0.57 | 1.778 | Launch | ||
32,768.0 pipeline |
3 | $0.84 | 1.422 | Launch | ||
32,768.0 tensor |
2 | $0.93 | 4.978 | Launch | ||
32,768.0 tensor |
4 | $1.12 | 3.067 | Launch | ||
32,768.0 |
1 | $1.20 | 2.333 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 4.978 | Launch | ||
32,768.0 pipeline |
3 | $1.43 | 0.822 | Launch | ||
32,768.0 tensor |
2 | $1.56 | 4.978 | Launch | ||
32,768.0 |
1 | $1.59 | 2.333 | Launch | ||
32,768.0 tensor |
4 | $1.82 | 2.267 | Launch | ||
32,768.0 tensor |
2 | $1.92 | 4.978 | Launch | ||
32,768.0 |
1 | $2.37 | 11.933 | Launch | ||
32,768.0 |
1 | $3.83 | 11.933 | Launch | ||
32,768.0 |
1 | $4.11 | 14.733 | Launch | ||
32,768.0 |
1 | $4.74 | 24.133 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.