Horus-1.0-4B-GGUF/README.md

---
license: mit
language:
- ar
- en
- fr
- es
- de
- it
- pt
- tr
- ur
- hi
tags:
- llama
- llm
- text-generation
- multilingual
- causal-lm
- arabic
- gguf
- quantized
- horus
- tokenai
- neuralnode
- tts
- voice
base_model: tokenaii/horus
widget:
  - text: "### User:\nWhat is the capital of Egypt?\n\n### Assistant:\nThe capital of Egypt is Cairo."
  - text: "### User:\nمن هو أول رئيس لمصر؟\n\n### Assistant:\nأول رئيس لمصر بعد ثورة 1952 هو محمد نجيب."
  - text: "### User:\nHello Horus!\n\n### Assistant:\nHello! I'm Horus, an AI assistant developed by TokenAI. How can I help you today?"
inference: true
---

# Hours-1.0-4B-GGUF

![Horus Model](media/main.png)

GGUF quantized versions of Horus-1.0-4B by TokenAI.

## Base Model

- **Source:** [tokenaii/horus](https://huggingface.co/tokenaii/horus)
- **Original Model:** Horus-1.0-4B (4B parameters)
- **Developer:** [Assem Sabry](https://assem.cloud/) & TokenAI
- **Organization:** [TokenAI](https://tokenai.cloud/)
- **Release Date:** April 2026
- **License:** MIT

## About TokenAI

**TokenAI** is an AI startup founded by [Assem Sabry](https://assem.cloud/) with headquarters in Egypt.

### Mission

TokenAI aims to deliver the strongest language models in the world and in the Arab world through the Horus family of models. The startup bridges the gap between cutting-edge AI capabilities and regional cultural contexts, starting with the Arab world.

### The Horus Family

Horus-1.0-4B marks the **first model in the Horus family line**. This is just the beginning of TokenAI's journey to create a comprehensive suite of AI models serving the Arab region.

# Horus-1.0-4B-GGUF
 
GGUF quantized versions of Horus-1.0-4B - A 4B parameter multilingual language model optimized for Arabic and English.
 
## Model Variants & Hardware Requirements
 
| Format | File Size | Min RAM (CPU) | Min VRAM (GPU) | Quality | Best For |
|--------|-----------|---------------|----------------|---------|----------|
| **F16** | 9.03 GB | 12 GB | 10 GB | Maximum quality | High-end GPUs (RTX 3090, A100) |
| **Q8_0** | 4.8 GB | 6 GB | 5 GB | Near-lossless | RTX 3060 12GB, RTX 4060 |
| **Q6_K** | 3.71 GB | 5 GB | 4 GB | Excellent | RTX 3060, RTX 4060 Laptop |
| **Q5_K_M** | 3.23 GB | 4 GB | 3.5 GB | Very Good | GTX 1650, RTX 3050 |
| **Q4_K_M** | 2.78 GB | 3.5 GB | 3 GB | Good | Entry-level GPUs, CPU-only |
 
### Detailed Hardware Requirements
 
#### F16 (FP16 - Full Precision)
- **File**: `Horus-1.0-4B-F16.gguf` (9.03 GB)
- **Min System RAM**: 12 GB
- **Min VRAM**: 10 GB
- **Recommended**: RTX 3090, RTX 4090, A100, A6000
- **Use Case**: Maximum quality, research, fine-tuning reference
 
#### Q8_0 (8-bit Quantization)
- **File**: `Horus-1.0-4B-Q8_0.gguf` (4.8 GB)
- **Min System RAM**: 6 GB
- **Min VRAM**: 5 GB
- **Recommended**: RTX 3060 12GB, RTX 4060, RTX 4070
- **Use Case**: Near-lossless quality with half the memory
 
#### Q6_K (6-bit K-Quant)
- **File**: `Horus-1.0-4B-Q6_K.gguf` (3.71 GB)
- **Min System RAM**: 5 GB
- **Min VRAM**: 4 GB
- **Recommended**: RTX 3060, RTX 4060 Laptop, GTX 1080 Ti
- **Use Case**: Excellent quality for most applications
 
#### Q5_K_M (5-bit K-Quant Medium)
- **File**: `Horus-1.0-4B-Q5_K_M.gguf` (3.23 GB)
- **Min System RAM**: 4 GB
- **Min VRAM**: 3.5 GB
- **Recommended**: GTX 1650 Super, RTX 3050, RTX 3050 Ti
- **Use Case**: Balanced quality and performance
 
#### Q4_K_M (4-bit K-Quant Medium)
- **File**: `Horus-1.0-4B-Q4_K_M.gguf` (2.78 GB)
- **Min System RAM**: 3.5 GB
- **Min VRAM**: 3 GB
- **Recommended**: GTX 1060 6GB, GTX 1650, Intel Arc A380
- **Use Case**: Maximum compression, edge devices, CPU inference

## Quick Start

### Using NeuralNode (Recommended)

The easiest way to use Horus GGUF models is with the NeuralNode framework:

```python
import neuralnode as nn

MODEL_ID = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf"
DEVICE = "cpu"  # Change to "cuda" for GPU acceleration

# Download and load
model = nn.HorusModel(MODEL_ID, device=DEVICE).load()

# Use immediately
response = model.chat([{"role": "user", "content": "hi horus im emy"}])
print(response.content)
```

### Using llama-cpp-python

For direct llama.cpp integration:

```python
from llama_cpp import Llama

llm = Llama(
    model_path="Horus-1.0-4B-Q4_K_M.gguf",
    n_ctx=4096
)

output = llm("Hello, how are you?", max_tokens=256)
print(output['choices'][0]['text'])
```

## Voice Interface with Replica TTS

Add natural voice output to your Horus GGUF model with Replica TTS:

```python
import neuralnode as nn

voice_id = "replica-aria-language{en-us}"

MODEL_ID = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf"
DEVICE = "cuda"

# Load model with Replica TTS
model = nn.HorusModel(
    MODEL_ID,
    tts_engine="replica_tts",
    voice=voice_id,
    device=DEVICE
).load()

# Chat and get spoken response
response = model.chat([{"role": "user", "content": "Hello!"}])
print(response.content)
response.play_audio()  # Plays the TTS audio
```

### Browse All Voices

```python
import neuralnode as nn

voices = nn.replica_voice_list()
for voice in voices:
    print(voice)
```

---

## Benchmark Results

Below are visual comparisons of Horus-1.0-4B against leading models.

### General Knowledge & Reasoning
![General Benchmarks](media/1.png)

### Arabic Language & Cultural Benchmarks
![Arabic Benchmarks](media/2.png)

### Coding & Tool Use Benchmarks
![Coding Benchmarks](media/3.png)

---

## Model Capabilities

- **Multilingual:** Supports 10+ languages including Arabic, English, French, Spanish, German, Italian, Portuguese, Turkish, Urdu, Hindi
- **Identity Recognition:** Knows itself as Horus from TokenAI
- **Reasoning:** Chain-of-thought capabilities
- **Context Length:** Up to 4096 tokens
- **Voice Output:** Replica TTS integration for natural speech

---

## Links

- **Base Model:** https://huggingface.co/tokenaii/horus
- **TokenAI Website:** https://tokenai.cloud/
- **Developer:** https://assem.cloud/
- **GitHub:** https://github.com/tokenaii/horus-1.0

---

**Note:** Quantized using llama.cpp for efficient inference. GGUF versions are optimized for local deployment with minimal resource requirements.
初始化项目，由ModelHub XC社区提供模型 Model: tokenaii/Horus-1.0-4B-GGUF Source: Original Platform 2026-04-22 02:44:41 +08:00			`---`
			`license: mit`
			`language:`
			`- ar`
			`- en`
			`- fr`
			`- es`
			`- de`
			`- it`
			`- pt`
			`- tr`
			`- ur`
			`- hi`
			`tags:`
			`- llama`
			`- llm`
			`- text-generation`
			`- multilingual`
			`- causal-lm`
			`- arabic`
			`- gguf`
			`- quantized`
			`- horus`
			`- tokenai`
			`- neuralnode`
			`- tts`
			`- voice`
			`base_model: tokenaii/horus`
			`widget:`
			`- text: "### User:\nWhat is the capital of Egypt?\n\n### Assistant:\nThe capital of Egypt is Cairo."`
			`- text: "### User:\nمن هو أول رئيس لمصر؟\n\n### Assistant:\nأول رئيس لمصر بعد ثورة 1952 هو محمد نجيب."`
			`- text: "### User:\nHello Horus!\n\n### Assistant:\nHello! I'm Horus, an AI assistant developed by TokenAI. How can I help you today?"`
			`inference: true`
			`---`

			`# Hours-1.0-4B-GGUF`

			`![Horus Model](media/main.png)`

			`GGUF quantized versions of Horus-1.0-4B by TokenAI.`

			`## Base Model`

			`- Source: [tokenaii/horus](https://huggingface.co/tokenaii/horus)`
			`- Original Model: Horus-1.0-4B (4B parameters)`
			`- Developer: [Assem Sabry](https://assem.cloud/) & TokenAI`
			`- Organization: [TokenAI](https://tokenai.cloud/)`
			`- Release Date: April 2026`
			`- License: MIT`

			`## About TokenAI`

			`TokenAI is an AI startup founded by [Assem Sabry](https://assem.cloud/) with headquarters in Egypt.`

			`### Mission`

			`TokenAI aims to deliver the strongest language models in the world and in the Arab world through the Horus family of models. The startup bridges the gap between cutting-edge AI capabilities and regional cultural contexts, starting with the Arab world.`

			`### The Horus Family`

			`Horus-1.0-4B marks the first model in the Horus family line. This is just the beginning of TokenAI's journey to create a comprehensive suite of AI models serving the Arab region.`

			`# Horus-1.0-4B-GGUF`

			`GGUF quantized versions of Horus-1.0-4B - A 4B parameter multilingual language model optimized for Arabic and English.`

			`## Model Variants & Hardware Requirements`

			`\| Format \| File Size \| Min RAM (CPU) \| Min VRAM (GPU) \| Quality \| Best For \|`
			`\|--------\|-----------\|---------------\|----------------\|---------\|----------\|`
			`\| F16 \| 9.03 GB \| 12 GB \| 10 GB \| Maximum quality \| High-end GPUs (RTX 3090, A100) \|`
			`\| Q8_0 \| 4.8 GB \| 6 GB \| 5 GB \| Near-lossless \| RTX 3060 12GB, RTX 4060 \|`
			`\| Q6_K \| 3.71 GB \| 5 GB \| 4 GB \| Excellent \| RTX 3060, RTX 4060 Laptop \|`
			`\| Q5_K_M \| 3.23 GB \| 4 GB \| 3.5 GB \| Very Good \| GTX 1650, RTX 3050 \|`
			`\| Q4_K_M \| 2.78 GB \| 3.5 GB \| 3 GB \| Good \| Entry-level GPUs, CPU-only \|`

			`### Detailed Hardware Requirements`

			`#### F16 (FP16 - Full Precision)`
			- File: `Horus-1.0-4B-F16.gguf` (9.03 GB)
			`- Min System RAM: 12 GB`
			`- Min VRAM: 10 GB`
			`- Recommended: RTX 3090, RTX 4090, A100, A6000`
			`- Use Case: Maximum quality, research, fine-tuning reference`

			`#### Q8_0 (8-bit Quantization)`
			- File: `Horus-1.0-4B-Q8_0.gguf` (4.8 GB)
			`- Min System RAM: 6 GB`
			`- Min VRAM: 5 GB`
			`- Recommended: RTX 3060 12GB, RTX 4060, RTX 4070`
			`- Use Case: Near-lossless quality with half the memory`

			`#### Q6_K (6-bit K-Quant)`
			- File: `Horus-1.0-4B-Q6_K.gguf` (3.71 GB)
			`- Min System RAM: 5 GB`
			`- Min VRAM: 4 GB`
			`- Recommended: RTX 3060, RTX 4060 Laptop, GTX 1080 Ti`
			`- Use Case: Excellent quality for most applications`

			`#### Q5_K_M (5-bit K-Quant Medium)`
			- File: `Horus-1.0-4B-Q5_K_M.gguf` (3.23 GB)
			`- Min System RAM: 4 GB`
			`- Min VRAM: 3.5 GB`
			`- Recommended: GTX 1650 Super, RTX 3050, RTX 3050 Ti`
			`- Use Case: Balanced quality and performance`

			`#### Q4_K_M (4-bit K-Quant Medium)`
			- File: `Horus-1.0-4B-Q4_K_M.gguf` (2.78 GB)
			`- Min System RAM: 3.5 GB`
			`- Min VRAM: 3 GB`
			`- Recommended: GTX 1060 6GB, GTX 1650, Intel Arc A380`
			`- Use Case: Maximum compression, edge devices, CPU inference`

			`## Quick Start`

			`### Using NeuralNode (Recommended)`

			`The easiest way to use Horus GGUF models is with the NeuralNode framework:`

			```python
			`import neuralnode as nn`

			`MODEL_ID = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf"`
			`DEVICE = "cpu" # Change to "cuda" for GPU acceleration`

			`# Download and load`
			`model = nn.HorusModel(MODEL_ID, device=DEVICE).load()`

			`# Use immediately`
			`response = model.chat([{"role": "user", "content": "hi horus im emy"}])`
			`print(response.content)`
			```

			`### Using llama-cpp-python`

			`For direct llama.cpp integration:`

			```python
			`from llama_cpp import Llama`

			`llm = Llama(`
			`model_path="Horus-1.0-4B-Q4_K_M.gguf",`
			`n_ctx=4096`
			`)`

			`output = llm("Hello, how are you?", max_tokens=256)`
			`print(output['choices'][0]['text'])`
			```

			`## Voice Interface with Replica TTS`

			`Add natural voice output to your Horus GGUF model with Replica TTS:`

			```python
			`import neuralnode as nn`

			`voice_id = "replica-aria-language{en-us}"`

			`MODEL_ID = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf"`
			`DEVICE = "cuda"`

			`# Load model with Replica TTS`
			`model = nn.HorusModel(`
			`MODEL_ID,`
			`tts_engine="replica_tts",`
			`voice=voice_id,`
			`device=DEVICE`
			`).load()`

			`# Chat and get spoken response`
			`response = model.chat([{"role": "user", "content": "Hello!"}])`
			`print(response.content)`
			`response.play_audio() # Plays the TTS audio`
			```

			`### Browse All Voices`

			```python
			`import neuralnode as nn`

			`voices = nn.replica_voice_list()`
			`for voice in voices:`
			`print(voice)`
			```

			`---`

			`## Benchmark Results`

			`Below are visual comparisons of Horus-1.0-4B against leading models.`

			`### General Knowledge & Reasoning`
			`![General Benchmarks](media/1.png)`

			`### Arabic Language & Cultural Benchmarks`
			`![Arabic Benchmarks](media/2.png)`

			`### Coding & Tool Use Benchmarks`
			`![Coding Benchmarks](media/3.png)`

			`---`

			`## Model Capabilities`

			`- Multilingual: Supports 10+ languages including Arabic, English, French, Spanish, German, Italian, Portuguese, Turkish, Urdu, Hindi`
			`- Identity Recognition: Knows itself as Horus from TokenAI`
			`- Reasoning: Chain-of-thought capabilities`
			`- Context Length: Up to 4096 tokens`
			`- Voice Output: Replica TTS integration for natural speech`

			`---`

			`## Links`

			`- Base Model: https://huggingface.co/tokenaii/horus`
			`- TokenAI Website: https://tokenai.cloud/`
			`- Developer: https://assem.cloud/`
			`- GitHub: https://github.com/tokenaii/horus-1.0`

			`---`

			`Note: Quantized using llama.cpp for efficient inference. GGUF versions are optimized for local deployment with minimal resource requirements.`