Granite-4.0-H-Tiny is a compact hybrid Mixture of Experts model with 7 billion total parameters and only 1 billion active parameters during inference. Architecturally, H-Tiny mirrors the structure of H-Small with the same 9:1 ratio of Mamba-2 to Transformer layers but utilizes fewer parameters in each layer. The model was trained on a corpus of 22 trillion tokens, ensuring high quality on enterprise tasks despite its compact size. It supports a context length of up to 128K tokens, with theoretical extensibility thanks to Mamba-2's constant memory requirements.
The performance of H-Tiny is impressive. On the IFEval benchmark, the model achieves a score of 81.44% on average, while on MMLU tasks, it scores 68.65%, demonstrating strong comprehension and reasoning capabilities.
This model is specifically designed for edge deployments, local applications, and low-latency scenarios where response speed and minimal resource requirements are critical. According to the developers, the model requires only 8 GB of memory in 8-bit mode, allowing it to run on consumer-grade GPUs like the RTX 3060 with 12GB of VRAM.
In enterprise scenarios, H-Tiny is recommended as a fast component for executing specific tasks within larger agent systems, as well as in use cases where data privacy compliance is crucial. For example, the model can handle function calling, data extraction and anonymization, or classification, offloading more complex reasoning tasks to other models within the system.
Model Name | Context | Type | GPU | TPS | Status | Link |
---|
There are no public endpoints for this model yet.
Rent your own physically dedicated instance with hourly or long-term monthly billing.
We recommend deploying private instances in the following scenarios:
Name | vCPU | RAM, MB | Disk, GB | GPU | |||
---|---|---|---|---|---|---|---|
131,072.0 |
16 | 16384 | 160 | 1 | $0.33 | Launch | |
131,072.0 |
10 | 16384 | 500 | 1 | $0.38 | Launch | |
131,072.0 |
16 | 32768 | 160 | 1 | $0.38 | Launch | |
131,072.0 |
16 | 32768 | 160 | 1 | $0.53 | Launch | |
131,072.0 |
16 | 32768 | 160 | 1 | $0.57 | Launch | |
131,072.0 |
16 | 24576 | 160 | 1 | $0.88 | Launch | |
131,072.0 |
16 | 32768 | 160 | 1 | $1.15 | Launch | |
131,072.0 |
12 | 65536 | 160 | 1 | $1.20 | Launch | |
131,072.0 |
16 | 65536 | 160 | 2 | $1.23 | Launch | |
131,072.0 |
16 | 65536 | 160 | 1 | $1.59 | Launch | |
131,072.0 |
16 | 65536 | 160 | 1 | $2.58 | Launch | |
131,072.0 |
16 | 65536 | 160 | 1 | $5.11 | Launch | |
131,072.0 |
16 | 131072 | 160 | 1 | $6.98 | Launch |
Name | vCPU | RAM, MB | Disk, GB | GPU | |||
---|---|---|---|---|---|---|---|
131,072.0 |
16 | 16384 | 160 | 1 | $0.33 | Launch | |
131,072.0 |
16 | 32768 | 160 | 1 | $0.38 | Launch | |
131,072.0 |
16 | 32768 | 160 | 1 | $0.53 | Launch | |
131,072.0 |
12 | 65536 | 160 | 2 | $0.69 | Launch | |
131,072.0 |
16 | 24576 | 160 | 1 | $0.88 | Launch | |
131,072.0 |
16 | 32762 | 160 | 2 | $0.97 | Launch | |
131,072.0 |
16 | 32768 | 160 | 1 | $1.15 | Launch | |
131,072.0 |
12 | 65536 | 160 | 1 | $1.20 | Launch | |
131,072.0 |
16 | 65536 | 160 | 2 | $1.23 | Launch | |
131,072.0 |
16 | 65536 | 160 | 1 | $1.59 | Launch | |
131,072.0 |
16 | 65536 | 160 | 1 | $2.58 | Launch | |
131,072.0 |
16 | 65536 | 160 | 1 | $5.11 | Launch | |
131,072.0 |
16 | 131072 | 160 | 1 | $6.98 | Launch |
Name | vCPU | RAM, MB | Disk, GB | GPU | |||
---|---|---|---|---|---|---|---|
131,072.0 |
16 | 32768 | 160 | 1 | $0.53 | Launch | |
131,072.0 |
16 | 32768 | 160 | 2 | $0.54 | Launch | |
131,072.0 |
16 | 32768 | 160 | 2 | $0.57 | Launch | |
131,072.0 |
12 | 65536 | 160 | 2 | $0.69 | Launch | |
131,072.0 |
16 | 24576 | 160 | 1 | $0.88 | Launch | |
131,072.0 |
16 | 32768 | 160 | 1 | $1.15 | Launch | |
131,072.0 |
12 | 65536 | 160 | 1 | $1.20 | Launch | |
131,072.0 |
16 | 65536 | 160 | 2 | $1.23 | Launch | |
131,072.0 |
16 | 65536 | 160 | 3 | $1.43 | Launch | |
131,072.0 |
16 | 65536 | 160 | 1 | $1.59 | Launch | |
131,072.0 |
16 | 65536 | 160 | 1 | $2.58 | Launch | |
131,072.0 |
16 | 65536 | 160 | 1 | $5.11 | Launch | |
131,072.0 |
16 | 131072 | 160 | 1 | $6.98 | Launch |
Contact our dedicated neural networks support team at nn@immers.cloud or send your request to the sales department at sale@immers.cloud.