llama-3-8b-gpt-4o-ru1.0

Llama-3-8B GPT-4o-RU1.0 is a fine-tuned version of the Llama-3-8B-Instruct model, created with the goal of significantly improving performance with the Russian language. The developer's key idea was to form a high-quality training dataset using the capabilities of GPT-4o — an OpenAI model known for its advanced multilingual abilities. A carefully cleaned and structured subset of the tagengo-gpt4 dataset was used as a foundation, enabling high training efficiency. 80% of the training examples were in Russian, making this model a specialized tool for Russian-language tasks.

From a technical standpoint, training was conducted for one epoch on two NVIDIA A100 accelerators using the Axolotl framework. The model architecture retains the base structure of Llama 3, with optimizations applied during training: Flash Attention 2 for accelerated processing and DeepSpeed ZeRO-2 for efficient memory distribution. The weights are saved in bfloat16 format for an optimal balance of performance and precision.

The quality of this model is confirmed by results on the MT-Bench benchmark (a multilingual test for evaluating dialog capabilities). In Russian, the model scored 8.12 points, surpassing GPT-3.5-turbo (7.94) and coming very close to the Suzume model (8.19), especially considering that the latter was trained on a dataset eight times larger and more diverse. It is important to note that, unlike many multilingual models where improving one language can degrade performance in English, this model shows an increase in English scores from 7.98 (for the base Llama-3) to 8.01, making it a well-balanced solution.

Thanks to its high proficiency in Russian-language tasks, the model is well-suited for a wide range of applications in the Russian-speaking domain.


Announce Date: 29.06.2024
Parameters: 9B
Context: 9K
Layers: 32
Attention Type: Full Attention
Developer: ruslandev
Transformers Version: 4.41.1
License: META LLAMA 3

Public endpoint

Use our pre-built public endpoints for free to test inference and explore llama-3-8b-gpt-4o-ru1.0 capabilities. You can obtain an API access token on the token management page after registration and verification.
Model Name Context Type GPU TPS Tooling Status Link
ruslandev/llama-3-8b-gpt-4o-ru1.0 8,192.0 Public RTX4090 54.00 AVAILABLE chat

API access to llama-3-8b-gpt-4o-ru1.0 endpoints

curl https://chat.immers.cloud/v1/endpoints/Llama-3-8B-GPT-4o-RU/generate/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer USER_API_KEY" \
-d '{"model": "Llama-3-8B-GPT-4o-RU", "messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say this is a test"}
], "temperature": 0, "max_tokens": 150}'
$response = Invoke-WebRequest https://chat.immers.cloud/v1/endpoints/Llama-3-8B-GPT-4o-RU/generate/chat/completions `
-Method POST `
-Headers @{
"Authorization" = "Bearer USER_API_KEY"
"Content-Type" = "application/json"
} `
-Body (@{
model = "Llama-3-8B-GPT-4o-RU"
messages = @(
@{ role = "system"; content = "You are a helpful assistant." },
@{ role = "user"; content = "Say this is a test" }
)
} | ConvertTo-Json)
($response.Content | ConvertFrom-Json).choices[0].message.content
#!pip install OpenAI --upgrade

from openai import OpenAI

client = OpenAI(
api_key="USER_API_KEY",
base_url="https://chat.immers.cloud/v1/endpoints/Llama-3-8B-GPT-4o-RU/generate/",
)

chat_response = client.chat.completions.create(
model="Llama-3-8B-GPT-4o-RU",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say this is a test"},
]
)
print(chat_response.choices[0].message.content)

Private server

Rent your own physically dedicated instance with hourly or long-term monthly billing.

We recommend deploying private instances in the following scenarios:

  • maximize endpoint performance,
  • enable full context for long sequences,
  • ensure top-tier security for data processing in an isolated, dedicated environment,
  • use custom weights, such as fine-tuned models or LoRA adapters.

Recommended server configurations for hosting llama-3-8b-gpt-4o-ru1.0

Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-1.16.16.160
8,192.0
1 $0.33 6.200 Launch
rtx2080ti-1.10.16.500
8,192.0
1 $0.38 1.700 Launch
teslaa2-1.16.32.160
8,192.0
1 $0.38 6.200 Launch
teslaa10-1.16.32.160
8,192.0
1 $0.53 13.400 Launch
rtx3080-1.16.32.160
8,192.0
1 $0.57 0.800 Launch
rtx3090-1.16.24.160
8,192.0
1 $0.83 13.400 Launch
rtx4090-1.16.32.160
8,192.0
1 $1.02 13.400 Launch
teslav100-1.12.64.160
8,192.0
1 $1.20 20.600 Launch
rtxa5000-2.16.64.160.nvlink
8,192.0
tensor
2 $1.23 32.500 Launch
rtx5090-1.16.64.160
8,192.0
1 $1.59 20.600 Launch
teslaa100-1.16.64.160
8,192.0
1 $2.37 63.800 Launch
h100-1.16.64.160
8,192.0
1 $3.83 63.800 Launch
h100nvl-1.16.96.160
8,192.0
1 $4.11 76.400 Launch
h200-1.16.128.160
8,192.0
1 $4.74 118.700 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslat4-1.16.16.160
8,192.0
1 $0.33 4.421 Launch
teslaa2-1.16.32.160
8,192.0
1 $0.38 4.421 Launch
teslaa10-1.16.32.160
8,192.0
1 $0.53 11.621 Launch
rtx2080ti-2.12.64.160
8,192.0
tensor
2 $0.69 7.321 Launch
rtx3090-1.16.24.160
8,192.0
1 $0.83 11.621 Launch
rtx3080-2.16.32.160
8,192.0
tensor
2 $0.97 5.521 Launch
rtx4090-1.16.32.160
8,192.0
1 $1.02 11.621 Launch
teslav100-1.12.64.160
8,192.0
1 $1.20 18.821 Launch
rtxa5000-2.16.64.160.nvlink
8,192.0
tensor
2 $1.23 30.721 Launch
rtx5090-1.16.64.160
8,192.0
1 $1.59 18.821 Launch
teslaa100-1.16.64.160
8,192.0
1 $2.37 62.021 Launch
h100-1.16.64.160
8,192.0
1 $3.83 62.021 Launch
h100nvl-1.16.96.160
8,192.0
1 $4.11 74.621 Launch
h200-1.16.128.160
8,192.0
1 $4.74 116.921 Launch
Prices:
Name GPU Price, hour TPS Max Concurrency
teslaa10-1.16.32.160
8,192.0
1 $0.53 3.030 Launch
teslat4-2.16.32.160
8,192.0
tensor
2 $0.54 7.730 Launch
teslaa2-2.16.32.160
8,192.0
tensor
2 $0.57 7.730 Launch
rtx3090-1.16.24.160
8,192.0
1 $0.83 3.030 Launch
rtx2080ti-3.12.24.120
8,192.0
pipeline
3 $0.84 6.130 Launch
rtx4090-1.16.32.160
8,192.0
1 $1.02 3.030 Launch
rtx2080ti-4.16.32.160
8,192.0
tensor
4 $1.12 13.530 Launch
teslav100-1.12.64.160
8,192.0
1 $1.20 10.230 Launch
rtxa5000-2.16.64.160.nvlink
8,192.0
tensor
2 $1.23 22.130 Launch
rtx3080-3.16.64.160
8,192.0
pipeline
3 $1.43 3.430 Launch
rtx5090-1.16.64.160
8,192.0
1 $1.59 10.230 Launch
rtx3080-4.16.64.160
8,192.0
tensor
4 $1.82 9.930 Launch
teslaa100-1.16.64.160
8,192.0
1 $2.37 53.430 Launch
h100-1.16.64.160
8,192.0
1 $3.83 53.430 Launch
h100nvl-1.16.96.160
8,192.0
1 $4.11 66.030 Launch
h200-1.16.128.160
8,192.0
1 $4.74 108.330 Launch

Related models

Need help?

Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.