A-vibe is a Russian-language large language model developed by Avito, built upon the open-source Qwen3-8B-Base. Its key innovation lies in a unique approach to Russian language adaptation: rather than merely fine-tuning the model, the developers completely replaced the tokenizer, merging English tokens from the original Qwen3 with Russian tokens from a specially trained tokenizer. This hybrid approach achieves high tokenization efficiency for Russian text—using on average 22% fewer tokens for the same content—significantly accelerating inference and reducing the model size to 7.9 billion parameters. As a result, A-vibe processes Russian-language queries 15–25% faster than the base version.
Technically, A-vibe’s training pipeline comprised several critical stages: first, tokenizer adaptation on a corpus of 150 billion tokens (31% Russian and 31% English); next, supervised fine-tuning (SFT) on over 800,000 examples, including synthetic dialogues with function calling. This was followed by GRPO (Generalized Reinforcement Learning with Policy Optimization) to enhance mathematical reasoning and function-calling capabilities, and DPO (Direct Preference Optimization) to improve dialogue safety and quality. Special attention was paid to partially freezing embeddings during tokenizer adaptation—an innovative gradient-hooking technique that preserved the quality of representations for English tokens.
A-vibe demonstrates outstanding performance on Russian-language benchmarks: it outperforms the base Qwen3-8B on math_500_ru (68.6% vs. 54.6%). On the BFCL V3 function-calling benchmark, the model achieves 58.63%, confirming its strong function-calling capabilities. Most impressively, in the RU_ARENA ranking, A-vibe surpasses not only Qwen3-8B but also other Russian-language models significantly larger in size.
Use cases for A-vibe naturally stem from its architecture and strengths. It is ideally suited for building intelligent Russian-language chatbots and assistants, analyzing and summarizing text (including user inquiries and documents), generating and explaining code, and solving logical and computational tasks within educational, analytical, and service-oriented products.
| Model Name | Context | Type | GPU | Status | Link |
|---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.33 | 1.426 | Launch | ||
32,768.0 |
1 | $0.38 | 1.434 | Launch | ||
32,768.0 |
1 | $0.53 | 3.049 | Launch | ||
32,768.0 tensor |
2 | $0.69 | 2.034 | Launch | ||
32,768.0 |
1 | $0.83 | 3.266 | Launch | ||
32,768.0 tensor |
2 | $0.97 | 1.645 | Launch | ||
32,768.0 |
1 | $1.02 | 3.258 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 6.916 | Launch | ||
32,768.0 |
1 | $1.59 | 4.859 | Launch | ||
32,768.0 |
1 | $2.37 | 14.651 | Launch | ||
32,768.0 |
1 | $3.83 | 14.637 | Launch | ||
32,768.0 |
1 | $4.11 | 17.481 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 30.120 | Launch | ||
32,768.0 |
1 | $4.74 | 27.031 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 54.879 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 |
1 | $0.53 | 2.232 | Launch | ||
32,768.0 tensor |
2 | $0.54 | 2.852 | Launch | ||
32,768.0 tensor |
2 | $0.57 | 2.868 | Launch | ||
32,768.0 tensor |
2 | $0.69 | 1.216 | Launch | ||
32,768.0 |
1 | $0.83 | 2.448 | Launch | ||
32,768.0 |
1 | $1.02 | 2.440 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 6.098 | Launch | ||
32,768.0 pipeline |
3 | $1.43 | 2.059 | Launch | ||
32,768.0 |
1 | $1.59 | 4.041 | Launch | ||
32,768.0 tensor |
4 | $1.82 | 3.291 | Launch | ||
32,768.0 |
1 | $2.37 | 13.834 | Launch | ||
32,768.0 |
1 | $3.83 | 13.820 | Launch | ||
32,768.0 |
1 | $4.11 | 16.663 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 29.303 | Launch | ||
32,768.0 |
1 | $4.74 | 26.213 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 54.061 | Launch | ||
| Name | GPU | TPS | Max Concurrency | |||
|---|---|---|---|---|---|---|
32,768.0 tensor |
2 | $0.54 | 1.217 | Launch | ||
32,768.0 tensor |
2 | $0.57 | 1.233 | Launch | ||
32,768.0 pipeline |
3 | $0.84 | 1.007 | Launch | ||
32,768.0 tensor |
2 | $0.93 | 4.464 | Launch | ||
32,768.0 tensor |
4 | $1.12 | 2.433 | Launch | ||
32,768.0 tensor |
2 | $1.23 | 4.464 | Launch | ||
32,768.0 tensor |
2 | $1.56 | 4.897 | Launch | ||
32,768.0 |
1 | $1.59 | 2.406 | Launch | ||
32,768.0 tensor |
4 | $1.82 | 1.656 | Launch | ||
32,768.0 tensor |
2 | $1.92 | 4.881 | Launch | ||
32,768.0 |
1 | $2.37 | 85.720 | 12.199 | Launch | |
32,768.0 |
1 | $3.83 | 12.185 | Launch | ||
32,768.0 |
1 | $4.11 | 15.029 | Launch | ||
32,768.0 tensor |
2 | $4.61 | 27.668 | Launch | ||
32,768.0 |
1 | $4.74 | 24.578 | Launch | ||
32,768.0 tensor |
2 | $9.40 | 52.426 | Launch | ||
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.