avibe

A-vibe is a Russian-language large language model developed by Avito, built upon the open-source Qwen3-8B-Base. Its key innovation lies in a unique approach to Russian language adaptation: rather than merely fine-tuning the model, the developers completely replaced the tokenizer, merging English tokens from the original Qwen3 with Russian tokens from a specially trained tokenizer. This hybrid approach achieves high tokenization efficiency for Russian text—using on average 22% fewer tokens for the same content—significantly accelerating inference and reducing the model size to 7.9 billion parameters. As a result, A-vibe processes Russian-language queries 15–25% faster than the base version.

Technically, A-vibe’s training pipeline comprised several critical stages: first, tokenizer adaptation on a corpus of 150 billion tokens (31% Russian and 31% English); next, supervised fine-tuning (SFT) on over 800,000 examples, including synthetic dialogues with function calling. This was followed by GRPO (Generalized Reinforcement Learning with Policy Optimization) to enhance mathematical reasoning and function-calling capabilities, and DPO (Direct Preference Optimization) to improve dialogue safety and quality. Special attention was paid to partially freezing embeddings during tokenizer adaptation—an innovative gradient-hooking technique that preserved the quality of representations for English tokens.

A-vibe demonstrates outstanding performance on Russian-language benchmarks: it outperforms the base Qwen3-8B on math_500_ru (68.6% vs. 54.6%). On the BFCL V3 function-calling benchmark, the model achieves 58.63%, confirming its strong function-calling capabilities. Most impressively, in the RU_ARENA ranking, A-vibe surpasses not only Qwen3-8B but also other Russian-language models significantly larger in size.

Use cases for A-vibe naturally stem from its architecture and strengths. It is ideally suited for building intelligent Russian-language chatbots and assistants, analyzing and summarizing text (including user inquiries and documents), generating and explaining code, and solving logical and computational tasks within educational, analytical, and service-oriented products.


Announce Date: 20.10.2025
Parameters: 8B
Context: 33K
Layers: 36
Attention Type: Full Attention
Developer: AvitoTech
Transformers Version: 4.52.0.dev0
License: Apache 2.0

Public endpoint

Use our pre-built public endpoints for free to test inference and explore avibe capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU Status Link
There are no public endpoints for this model yet.

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting avibe

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-1.16.16.160
32,768.0
1 $0.33 1.426 Launch
teslaa2-1.16.32.160
32,768.0
1 $0.38 1.434 Launch
teslaa10-1.16.32.160
32,768.0
1 $0.53 3.049 Launch
rtx2080ti-2.12.64.160
32,768.0
tensor
2 $0.69 2.034 Launch
rtx3090-1.16.24.160
32,768.0
1 $0.83 3.266 Launch
rtx3080-2.16.32.160
32,768.0
tensor
2 $0.97 1.645 Launch
rtx4090-1.16.32.160
32,768.0
1 $1.02 3.258 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
2 $1.23 6.916 Launch
rtx5090-1.16.64.160
32,768.0
1 $1.59 4.859 Launch
teslaa100-1.16.64.160
32,768.0
1 $2.37 14.651 Launch
h100-1.16.64.160
32,768.0
1 $3.83 14.637 Launch
h100nvl-1.16.96.160
32,768.0
1 $4.11 17.481 Launch
teslaa100-2.24.96.160.nvlink
32,768.0
tensor
2 $4.61 30.120 Launch
h200-1.16.128.160
32,768.0
1 $4.74 27.031 Launch
h200-2.24.256.160.nvlink
32,768.0
tensor
2 $9.40 54.879 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-1.16.32.160
32,768.0
1 $0.53 2.232 Launch
teslat4-2.16.32.160
32,768.0
tensor
2 $0.54 2.852 Launch
teslaa2-2.16.32.160
32,768.0
tensor
2 $0.57 2.868 Launch
rtx2080ti-2.12.64.160
32,768.0
tensor
2 $0.69 1.216 Launch
rtx3090-1.16.24.160
32,768.0
1 $0.83 2.448 Launch
rtx4090-1.16.32.160
32,768.0
1 $1.02 2.440 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
2 $1.23 6.098 Launch
rtx3080-3.16.64.160
32,768.0
pipeline
3 $1.43 2.059 Launch
rtx5090-1.16.64.160
32,768.0
1 $1.59 4.041 Launch
rtx3080-4.16.64.160
32,768.0
tensor
4 $1.82 3.291 Launch
teslaa100-1.16.64.160
32,768.0
1 $2.37 13.834 Launch
h100-1.16.64.160
32,768.0
1 $3.83 13.820 Launch
h100nvl-1.16.96.160
32,768.0
1 $4.11 16.663 Launch
teslaa100-2.24.96.160.nvlink
32,768.0
tensor
2 $4.61 29.303 Launch
h200-1.16.128.160
32,768.0
1 $4.74 26.213 Launch
h200-2.24.256.160.nvlink
32,768.0
tensor
2 $9.40 54.061 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-2.16.32.160
32,768.0
tensor
2 $0.54 1.217 Launch
teslaa2-2.16.32.160
32,768.0
tensor
2 $0.57 1.233 Launch
rtx2080ti-3.12.24.120
32,768.0
pipeline
3 $0.84 1.007 Launch
teslaa10-2.16.64.160
32,768.0
tensor
2 $0.93 4.464 Launch
rtx2080ti-4.16.32.160
32,768.0
tensor
4 $1.12 2.433 Launch
rtxa5000-2.16.64.160.nvlink
32,768.0
tensor
2 $1.23 4.464 Launch
rtx3090-2.16.64.160
32,768.0
tensor
2 $1.56 4.897 Launch
rtx5090-1.16.64.160
32,768.0
1 $1.59 2.406 Launch
rtx3080-4.16.64.160
32,768.0
tensor
4 $1.82 1.656 Launch
rtx4090-2.16.64.160
32,768.0
tensor
2 $1.92 4.881 Launch
teslaa100-1.16.64.160
32,768.0
1 $2.37 85.720 12.199 Launch
h100-1.16.64.160
32,768.0
1 $3.83 12.185 Launch
h100nvl-1.16.96.160
32,768.0
1 $4.11 15.029 Launch
teslaa100-2.24.96.160.nvlink
32,768.0
tensor
2 $4.61 27.668 Launch
h200-1.16.128.160
32,768.0
1 $4.74 24.578 Launch
h200-2.24.256.160.nvlink
32,768.0
tensor
2 $9.40 52.426 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.