196 lines
6.9 KiB
Markdown
196 lines
6.9 KiB
Markdown
---
|
||
language:
|
||
- ar
|
||
license: apache-2.0
|
||
library_name: gguf
|
||
base_model: NightPrince/Qwen3-4B-Islamic-Arabic
|
||
datasets:
|
||
- NightPrince/islamic-arabic-qa
|
||
tags:
|
||
- arabic
|
||
- islamic
|
||
- fiqh
|
||
- fatwa
|
||
- qwen3
|
||
- gguf
|
||
- llama-cpp
|
||
- ollama
|
||
- quantized
|
||
- instruction-tuning
|
||
pipeline_tag: text-generation
|
||
---
|
||
|
||
# Qwen3-4B-Islamic-Arabic-GGUF
|
||
|
||
**GGUF quantized versions of Qwen3-4B-Islamic-Arabic for llama.cpp, Ollama, and LM Studio.**
|
||
|
||
This repository contains three GGUF files at different quantization levels, converted from [NightPrince/Qwen3-4B-Islamic-Arabic](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic) (the merged FP16 model). All standard GGUF-compatible runtimes are supported: llama.cpp, Ollama, LM Studio, Jan, and others.
|
||
|
||
Trained and converted by **[Yahya Alnwsany (NightPrince)](https://huggingface.co/NightPrince)** — 2026-05-05.
|
||
|
||
---
|
||
|
||
## Files
|
||
|
||
| File | Size | Recommended for |
|
||
|---|---|---|
|
||
| `qwen3-4b-islamic-q4_k_m.gguf` | 2.3 GB | **Most users** — best quality/size balance |
|
||
| `qwen3-4b-islamic-q8_0.gguf` | 4.0 GB | High quality, more RAM available |
|
||
| `qwen3-4b-islamic-f16.gguf` | 7.5 GB | Reference / re-quantization source |
|
||
|
||
**Recommendation**: Start with `q4_k_m`. If you have 6+ GB of RAM headroom and want noticeably sharper Arabic output, use `q8_0`. The `f16` file is the lossless reference and is best used as a source for producing custom quantizations with llama.cpp's `llama-quantize`.
|
||
|
||
---
|
||
|
||
## Model Variants
|
||
|
||
| Variant | Repo | Description |
|
||
|---|---|---|
|
||
| **Merged FP16** | [NightPrince/Qwen3-4B-Islamic-Arabic](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic) | Canonical merged model, FP16, ~7.6 GB — drop-in for transformers or vLLM |
|
||
| **LoRA Adapter** | [NightPrince/Qwen3-4B-Islamic-Arabic-LoRA](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-LoRA) | PEFT adapter only, 264 MB — apply on top of `Qwen/Qwen3-4B` |
|
||
| **INT4 Quantized** | [NightPrince/Qwen3-4B-Islamic-Arabic-INT4](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-INT4) | W4A16 compressed-tensors for fast vLLM serving, 2.5 GB |
|
||
| **MLX 4-bit** | [NightPrince/Qwen3-4B-Islamic-Arabic-mlx-4Bit](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-mlx-4Bit) | Apple Silicon / MLX — native Mac inference, 4-bit quantized |
|
||
| **GGUF** (this model) | [NightPrince/Qwen3-4B-Islamic-Arabic-GGUF](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-GGUF) | llama.cpp / Ollama / LM Studio — Q4_K_M (2.3 GB), Q8_0 (4.0 GB), F16 (7.5 GB) |
|
||
| **Dataset** | [NightPrince/islamic-arabic-qa](https://huggingface.co/datasets/NightPrince/islamic-arabic-qa) | 17,944 train / 2,101 val / 1,042 test — Islamic Arabic Q&A pairs |
|
||
|
||
---
|
||
|
||
## Usage
|
||
|
||
### Ollama
|
||
|
||
**Step 1: Create a Modelfile**
|
||
|
||
Save the following as `Modelfile` (no extension) in any directory:
|
||
|
||
```
|
||
FROM ./qwen3-4b-islamic-q4_k_m.gguf
|
||
|
||
SYSTEM """أنت مساعد عالم إسلامي متخصص. أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً."""
|
||
|
||
PARAMETER temperature 0.7
|
||
PARAMETER top_p 0.9
|
||
PARAMETER num_ctx 4096
|
||
```
|
||
|
||
> Important: The `SYSTEM` field above contains the exact system prompt the model was fine-tuned with. Using it will produce the best results.
|
||
|
||
**Step 2: Download the GGUF file**
|
||
|
||
```bash
|
||
# Using huggingface-cli
|
||
pip install huggingface_hub
|
||
huggingface-cli download NightPrince/Qwen3-4B-Islamic-Arabic-GGUF \
|
||
qwen3-4b-islamic-q4_k_m.gguf \
|
||
--local-dir .
|
||
```
|
||
|
||
**Step 3: Build and run**
|
||
|
||
```bash
|
||
# Create the Ollama model
|
||
ollama create qwen3-islamic -f Modelfile
|
||
|
||
# Run interactively
|
||
ollama run qwen3-islamic
|
||
|
||
# Or query via API
|
||
curl http://localhost:11434/api/generate -d '{
|
||
"model": "qwen3-islamic",
|
||
"prompt": "ما حكم الاحتفال بالمولد النبوي الشريف؟",
|
||
"stream": false
|
||
}'
|
||
```
|
||
|
||
---
|
||
|
||
### llama.cpp
|
||
|
||
**Build llama.cpp** (if not already installed):
|
||
|
||
```bash
|
||
git clone https://github.com/ggerganov/llama.cpp
|
||
cd llama.cpp
|
||
make -j$(nproc) # CPU
|
||
# For CUDA: make GGML_CUDA=1 -j$(nproc)
|
||
```
|
||
|
||
**Download a GGUF file:**
|
||
|
||
```bash
|
||
huggingface-cli download NightPrince/Qwen3-4B-Islamic-Arabic-GGUF \
|
||
qwen3-4b-islamic-q4_k_m.gguf \
|
||
--local-dir ./models
|
||
```
|
||
|
||
**Run the llama.cpp HTTP server:**
|
||
|
||
```bash
|
||
./llama-server \
|
||
--model ./models/qwen3-4b-islamic-q4_k_m.gguf \
|
||
--ctx-size 4096 \
|
||
--n-gpu-layers 99 \
|
||
--host 0.0.0.0 \
|
||
--port 8080 \
|
||
--system-prompt "أنت مساعد عالم إسلامي متخصص. أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً."
|
||
```
|
||
|
||
**CLI inference:**
|
||
|
||
```bash
|
||
./llama-cli \
|
||
--model ./models/qwen3-4b-islamic-q4_k_m.gguf \
|
||
--ctx-size 4096 \
|
||
--n-gpu-layers 99 \
|
||
--chat-template qwen3 \
|
||
--system-prompt "أنت مساعد عالم إسلامي متخصص. أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً." \
|
||
--prompt "ما هي أركان الإسلام الخمسة؟" \
|
||
--n-predict 512
|
||
```
|
||
|
||
---
|
||
|
||
### LM Studio
|
||
|
||
1. Open **LM Studio** and go to the **Search** tab.
|
||
2. Search for `NightPrince/Qwen3-4B-Islamic-Arabic-GGUF`.
|
||
3. Download `qwen3-4b-islamic-q4_k_m.gguf` (recommended) from the file list.
|
||
4. Load the model and open the **Chat** tab.
|
||
5. In **System Prompt**, paste:
|
||
```
|
||
أنت مساعد عالم إسلامي متخصص. أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً.
|
||
```
|
||
6. Set **Temperature** to `0.7` and **Context Length** to `4096` for best results.
|
||
|
||
---
|
||
|
||
## Hardware Requirements
|
||
|
||
| File | Min RAM (CPU) | Min VRAM (GPU offload) |
|
||
|---|---|---|
|
||
| `q4_k_m` (2.3 GB) | 4 GB | 3–4 GB |
|
||
| `q8_0` (4.0 GB) | 6 GB | 5–6 GB |
|
||
| `f16` (7.5 GB) | 10 GB | 8–10 GB |
|
||
|
||
Use `--n-gpu-layers 99` in llama.cpp to offload all layers to GPU. Reduce the value if you run out of VRAM.
|
||
|
||
---
|
||
|
||
## Citation
|
||
|
||
```bibtex
|
||
@misc{alnwsany2026qwen3islamicarbic,
|
||
author = {Yahya Alnwsany},
|
||
title = {Qwen3-4B-Islamic-Arabic: QLoRA Fine-Tuning of Qwen3-4B on Islamic Arabic Q\&A},
|
||
year = {2026},
|
||
howpublished = {\url{https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic}},
|
||
note = {Base model: Qwen/Qwen3-4B. Dataset: NightPrince/islamic-arabic-qa.}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
Apache 2.0 — consistent with the base model [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B).
|