--- license: mit language: - ar - en - fr - es - de - it - pt - tr - ur - hi tags: - llama - llm - text-generation - multilingual - causal-lm - arabic - gguf - quantized - horus - tokenai - neuralnode - tts - voice base_model: tokenaii/horus widget: - text: "### User:\nWhat is the capital of Egypt?\n\n### Assistant:\nThe capital of Egypt is Cairo." - text: "### User:\nمن هو أول رئيس لمصر؟\n\n### Assistant:\nأول رئيس لمصر بعد ثورة 1952 هو محمد نجيب." - text: "### User:\nHello Horus!\n\n### Assistant:\nHello! I'm Horus, an AI assistant developed by TokenAI. How can I help you today?" inference: true --- # Hours-1.0-4B-GGUF ![Horus Model](media/main.png) GGUF quantized versions of Horus-1.0-4B by TokenAI. ## Base Model - **Source:** [tokenaii/horus](https://huggingface.co/tokenaii/horus) - **Original Model:** Horus-1.0-4B (4B parameters) - **Developer:** [Assem Sabry](https://assem.cloud/) & TokenAI - **Organization:** [TokenAI](https://tokenai.cloud/) - **Release Date:** April 2026 - **License:** MIT ## About TokenAI **TokenAI** is an AI startup founded by [Assem Sabry](https://assem.cloud/) with headquarters in Egypt. ### Mission TokenAI aims to deliver the strongest language models in the world and in the Arab world through the Horus family of models. The startup bridges the gap between cutting-edge AI capabilities and regional cultural contexts, starting with the Arab world. ### The Horus Family Horus-1.0-4B marks the **first model in the Horus family line**. This is just the beginning of TokenAI's journey to create a comprehensive suite of AI models serving the Arab region. # Horus-1.0-4B-GGUF GGUF quantized versions of Horus-1.0-4B - A 4B parameter multilingual language model optimized for Arabic and English. ## Model Variants & Hardware Requirements | Format | File Size | Min RAM (CPU) | Min VRAM (GPU) | Quality | Best For | |--------|-----------|---------------|----------------|---------|----------| | **F16** | 9.03 GB | 12 GB | 10 GB | Maximum quality | High-end GPUs (RTX 3090, A100) | | **Q8_0** | 4.8 GB | 6 GB | 5 GB | Near-lossless | RTX 3060 12GB, RTX 4060 | | **Q6_K** | 3.71 GB | 5 GB | 4 GB | Excellent | RTX 3060, RTX 4060 Laptop | | **Q5_K_M** | 3.23 GB | 4 GB | 3.5 GB | Very Good | GTX 1650, RTX 3050 | | **Q4_K_M** | 2.78 GB | 3.5 GB | 3 GB | Good | Entry-level GPUs, CPU-only | ### Detailed Hardware Requirements #### F16 (FP16 - Full Precision) - **File**: `Horus-1.0-4B-F16.gguf` (9.03 GB) - **Min System RAM**: 12 GB - **Min VRAM**: 10 GB - **Recommended**: RTX 3090, RTX 4090, A100, A6000 - **Use Case**: Maximum quality, research, fine-tuning reference #### Q8_0 (8-bit Quantization) - **File**: `Horus-1.0-4B-Q8_0.gguf` (4.8 GB) - **Min System RAM**: 6 GB - **Min VRAM**: 5 GB - **Recommended**: RTX 3060 12GB, RTX 4060, RTX 4070 - **Use Case**: Near-lossless quality with half the memory #### Q6_K (6-bit K-Quant) - **File**: `Horus-1.0-4B-Q6_K.gguf` (3.71 GB) - **Min System RAM**: 5 GB - **Min VRAM**: 4 GB - **Recommended**: RTX 3060, RTX 4060 Laptop, GTX 1080 Ti - **Use Case**: Excellent quality for most applications #### Q5_K_M (5-bit K-Quant Medium) - **File**: `Horus-1.0-4B-Q5_K_M.gguf` (3.23 GB) - **Min System RAM**: 4 GB - **Min VRAM**: 3.5 GB - **Recommended**: GTX 1650 Super, RTX 3050, RTX 3050 Ti - **Use Case**: Balanced quality and performance #### Q4_K_M (4-bit K-Quant Medium) - **File**: `Horus-1.0-4B-Q4_K_M.gguf` (2.78 GB) - **Min System RAM**: 3.5 GB - **Min VRAM**: 3 GB - **Recommended**: GTX 1060 6GB, GTX 1650, Intel Arc A380 - **Use Case**: Maximum compression, edge devices, CPU inference ## Quick Start ### Using NeuralNode (Recommended) The easiest way to use Horus GGUF models is with the NeuralNode framework: ```python import neuralnode as nn MODEL_ID = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-Q6_K.gguf" DEVICE = "cpu" # Change to "cuda" for GPU acceleration # Download and load model = nn.HorusModel(MODEL_ID, device=DEVICE).load() # Use immediately response = model.chat([{"role": "user", "content": "hi horus im emy"}]) print(response.content) ``` ### Using llama-cpp-python For direct llama.cpp integration: ```python from llama_cpp import Llama llm = Llama( model_path="Horus-1.0-4B-Q4_K_M.gguf", n_ctx=4096 ) output = llm("Hello, how are you?", max_tokens=256) print(output['choices'][0]['text']) ``` ## Voice Interface with Replica TTS Add natural voice output to your Horus GGUF model with Replica TTS: ```python import neuralnode as nn voice_id = "replica-aria-language{en-us}" MODEL_ID = "tokenaii/Hours-1.0-4B-GGUF/Horus-1.0-4B-F16.gguf" DEVICE = "cuda" # Load model with Replica TTS model = nn.HorusModel( MODEL_ID, tts_engine="replica_tts", voice=voice_id, device=DEVICE ).load() # Chat and get spoken response response = model.chat([{"role": "user", "content": "Hello!"}]) print(response.content) response.play_audio() # Plays the TTS audio ``` ### Browse All Voices ```python import neuralnode as nn voices = nn.replica_voice_list() for voice in voices: print(voice) ``` --- ## Benchmark Results Below are visual comparisons of Horus-1.0-4B against leading models. ### General Knowledge & Reasoning ![General Benchmarks](media/1.png) ### Arabic Language & Cultural Benchmarks ![Arabic Benchmarks](media/2.png) ### Coding & Tool Use Benchmarks ![Coding Benchmarks](media/3.png) --- ## Model Capabilities - **Multilingual:** Supports 10+ languages including Arabic, English, French, Spanish, German, Italian, Portuguese, Turkish, Urdu, Hindi - **Identity Recognition:** Knows itself as Horus from TokenAI - **Reasoning:** Chain-of-thought capabilities - **Context Length:** Up to 4096 tokens - **Voice Output:** Replica TTS integration for natural speech --- ## Links - **Base Model:** https://huggingface.co/tokenaii/horus - **TokenAI Website:** https://tokenai.cloud/ - **Developer:** https://assem.cloud/ - **GitHub:** https://github.com/tokenaii/horus-1.0 --- **Note:** Quantized using llama.cpp for efficient inference. GGUF versions are optimized for local deployment with minimal resource requirements.