GigaChat3-10B-A1.8B is an excellent example of efficient computation in LLMs. With a total size of 10 billion parameters, only 1.8 billion are active during generation. This places its speed on par with very small models, but the Mixture-of-Experts architecture allows it to store a significantly larger amount of knowledge. Generation speed is further enhanced by the MTP (Multi-Token Prediction) mechanism, which produces multiple output tokens at once. Furthermore, the model implements Multi-head Latent Attention (MLA), which compresses the Key-Value cache into a latent vector, reducing GPU memory requirements. This enables efficient and cost-effective operation with a long context of 256K tokens.
The model underwent comprehensive training on 20 trillion tokens, which included 10 additional non-standard languages (languages of former USSR countries, Chinese, Arabic) and a massive block of synthetic data to ensure high-quality responses in mathematics, logic, and programming. This training sets it apart favorably from compact versions of Llama or Gemma, which often struggle with Russian grammar or lack knowledge of Russian everyday and cultural contexts. GigaChat 3 Lightning (as this model is also known), on the contrary, demonstrates high coherence and proficiency in Russian speech and even an understanding of colloquial terms.
Thanks to its low latency and high throughput, the model is ideally suited for creating fast conversational agents and first-line support chatbots, for use as a "Router model" in agentic systems (classifying queries before routing them to a larger model), and for inference on limited resources (Edge devices, modest servers). The model supports easy deployment via popular frameworks: transformers, vLLM, and SGLang, and is offered in two versions—FP8 and bfloat16—allowing users to choose between performance and quality.
| Model Name | Context | Type | GPU | TPS | Status | Link |
|---|---|---|---|---|---|---|
| ai-sage/GigaChat3-10B-A1.8B | 262,144.0 | Public | RTX4090 | AVAILABLE | chat |
curl https://chat.immers.cloud/v1/endpoints/gigachat3-10b-a1.8b/generate/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer USER_API_KEY" \
-d '{"model": "GigaChat-3-10B-A1.8B", "messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say this is a test"}
], "temperature": 0, "max_tokens": 150}'
$response = Invoke-WebRequest https://chat.immers.cloud/v1/endpoints/gigachat3-10b-a1.8b/generate/chat/completions `
-Method POST `
-Headers @{
"Authorization" = "Bearer USER_API_KEY"
"Content-Type" = "application/json"
} `
-Body (@{
model = "GigaChat-3-10B-A1.8B"
messages = @(
@{ role = "system"; content = "You are a helpful assistant." },
@{ role = "user"; content = "Say this is a test" }
)
} | ConvertTo-Json)
($response.Content | ConvertFrom-Json).choices[0].message.content
#!pip install OpenAI --upgrade
from openai import OpenAI
client = OpenAI(
api_key="USER_API_KEY",
base_url="https://chat.immers.cloud/v1/endpoints/gigachat3-10b-a1.8b/generate/",
)
chat_response = client.chat.completions.create(
model="GigaChat-3-10B-A1.8B",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say this is a test"},
]
)
print(chat_response.choices[0].message.content)
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
262,144.0 |
16 | 16384 | 160 | 1 | $0.33 | Launch | |
262,144.0 |
16 | 32768 | 160 | 1 | $0.38 | Launch | |
262,144.0 |
16 | 32768 | 160 | 1 | $0.53 | Launch | |
262,144.0 tensor |
12 | 65536 | 160 | 2 | $0.69 | Launch | |
262,144.0 |
16 | 24576 | 160 | 1 | $0.88 | Launch | |
262,144.0 tensor |
16 | 32762 | 160 | 2 | $0.97 | Launch | |
262,144.0 |
16 | 32768 | 160 | 1 | $1.15 | Launch | |
262,144.0 |
12 | 65536 | 160 | 1 | $1.20 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $1.23 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $1.59 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $2.37 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $3.83 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $4.74 | Launch | |
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
262,144.0 tensor |
16 | 32768 | 160 | 2 | $0.54 | Launch | |
262,144.0 tensor |
16 | 32768 | 160 | 2 | $0.57 | Launch | |
262,144.0 pipeline |
12 | 24576 | 120 | 3 | $0.84 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $0.93 | Launch | |
262,144.0 tensor |
16 | 32768 | 160 | 4 | $1.12 | Launch | |
262,144.0 |
12 | 65536 | 160 | 1 | $1.20 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $1.23 | Launch | |
262,144.0 pipeline |
16 | 65536 | 160 | 3 | $1.43 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $1.59 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $1.67 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 4 | $1.82 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $2.19 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $2.37 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $3.83 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $4.74 | Launch | |
| Name | vCPU | RAM, MB | Disk, GB | GPU | |||
|---|---|---|---|---|---|---|---|
262,144.0 pipeline |
32 | 65536 | 160 | 3 | $0.88 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $0.93 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 4 | $0.96 | Launch | |
262,144.0 pipeline |
32 | 131072 | 160 | 3 | $1.06 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 4 | $1.18 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $1.23 | Launch | |
262,144.0 tensor |
32 | 131072 | 160 | 4 | $1.26 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $1.67 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $2.19 | Launch | |
262,144.0 tensor |
16 | 65535 | 240 | 2 | $2.22 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $2.37 | Launch | |
262,144.0 tensor |
16 | 65536 | 160 | 2 | $2.93 | Launch | |
262,144.0 |
16 | 65536 | 160 | 1 | $3.83 | Launch | |
262,144.0 |
16 | 131072 | 160 | 1 | $4.74 | Launch | |
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.