Qwen3-4B-Islamic-Arabic/README.md

---
language:
  - ar
license: apache-2.0
library_name: transformers
base_model: Qwen/Qwen3-4B
datasets:
  - NightPrince/islamic-arabic-qa
tags:
  - arabic
  - islamic
  - fiqh
  - fatwa
  - qlora
  - peft
  - qwen3
  - instruction-tuning
  - conversational
pipeline_tag: text-generation
inference:
  parameters:
    max_new_tokens: 512
    temperature: 0.3
    do_sample: true
widget:
  - text: "ما حكم زكاة الفطر وما مقدارها؟"
    example_title: "زكاة الفطر"
  - text: "ما الفرق بين الفرض والواجب عند الحنفية؟"
    example_title: "الفرض والواجب"
  - text: "ما حكم بيع العينة في الفقه الإسلامي؟"
    example_title: "بيع العينة"
  - text: "ما شروط صحة الصلاة؟"
    example_title: "شروط الصلاة"
model-index:
  - name: Qwen3-4B-Islamic-Arabic
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Islamic Arabic Q&A
          type: NightPrince/islamic-arabic-qa
          split: validation
        metrics:
          - name: Validation Loss
            type: loss
            value: 2.4094
            verified: false
---

# Qwen3-4B-Islamic-Arabic

**Qwen3-4B fine-tuned on Islamic Arabic Q&A via QLoRA — merged FP16, ready for direct inference.**

This is the canonical, fully merged version of a Qwen3-4B model fine-tuned on 17,944 high-quality Islamic Arabic question-answer pairs spanning Fiqh, Fatwa, Aqeedah, Quran Sciences, and Islamic Finance. The LoRA adapter has been merged into the base weights and saved in FP16; no additional adapter loading is required.

Trained by **[Yahya Alnwsany (NightPrince)](https://huggingface.co/NightPrince)** — 2026-05-05.

---

## Model Variants

| Variant | Repo | Description |
|---|---|---|
| **Merged FP16** (this model) | [NightPrince/Qwen3-4B-Islamic-Arabic](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic) | Canonical merged model, FP16, ~7.6 GB — drop-in for transformers or vLLM |
| **LoRA Adapter** | [NightPrince/Qwen3-4B-Islamic-Arabic-LoRA](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-LoRA) | PEFT adapter only, 264 MB — apply on top of `Qwen/Qwen3-4B` |
| **INT4 Quantized** | [NightPrince/Qwen3-4B-Islamic-Arabic-INT4](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-INT4) | W4A16 compressed-tensors for fast vLLM serving, 2.5 GB |
| **MLX 4-bit** | [NightPrince/Qwen3-4B-Islamic-Arabic-mlx-4Bit](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-mlx-4Bit) | Apple Silicon / MLX — native Mac inference, 4-bit quantized |
| **GGUF** | [NightPrince/Qwen3-4B-Islamic-Arabic-GGUF](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-GGUF) | llama.cpp / Ollama / LM Studio — Q4_K_M (2.3 GB), Q8_0 (4.0 GB), F16 (7.5 GB) |
| **Dataset** | [NightPrince/islamic-arabic-qa](https://huggingface.co/datasets/NightPrince/islamic-arabic-qa) | 17,944 train / 2,101 val / 1,042 test — Islamic Arabic Q&A pairs |

---

## Training Metrics

### Loss Curve

| Checkpoint | Train Loss | Eval Loss |
|---|---|---|
| Step 0 (init) | — | — |
| Step 843 (final) | **1.8918** | **2.4094** (best) |

### Token Accuracy

| Phase | Token Accuracy |
|---|---|
| Early training | ~50% |
| End of training | ~60% |

> **MCQ evaluation coming soon** — a multiple-choice benchmark (Islamics domain) is prepared but requires serving the model via vLLM. Results will be posted here once available.

---

## Usage

### Transformers Inference

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "NightPrince/Qwen3-4B-Islamic-Arabic"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

SYSTEM_PROMPT = (
    "أنت مساعد عالم إسلامي متخصص. "
    "أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. "
    "استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً."
)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "ما حكم الزكاة على المال المدخر؟"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```

### vLLM Serving

The merged FP16 model is ~7.6 GB. Use at least `tensor_parallel_size=2` on 11 GB GPUs (e.g., RTX 2080 Ti), or a single 24 GB+ GPU.

```bash
# Install vLLM if needed
pip install vllm

# Serve with tensor parallelism across 2 GPUs
vllm serve NightPrince/Qwen3-4B-Islamic-Arabic \
    --dtype float16 \
    --tensor-parallel-size 2 \
    --max-model-len 4096 \
    --port 8000
```

Query the running server:

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token-abc123")

SYSTEM_PROMPT = (
    "أنت مساعد عالم إسلامي متخصص. "
    "أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. "
    "استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً."
)

response = client.chat.completions.create(
    model="NightPrince/Qwen3-4B-Islamic-Arabic",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "ما حكم الزكاة على المال المدخر؟"},
    ],
    max_tokens=512,
    temperature=0.7,
)
print(response.choices[0].message.content)
```

> **Prefer lower memory?** Use the [INT4 quantized variant](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-INT4) (2.5 GB) for vLLM or the [GGUF variant](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-GGUF) for llama.cpp / Ollama.

---

## Training Details

### Dataset

| Property | Value |
|---|---|
| Dataset | [NightPrince/islamic-arabic-qa](https://huggingface.co/datasets/NightPrince/islamic-arabic-qa) |
| Train split | 17,944 samples |
| Validation split | 2,101 samples |
| Test split | 1,042 samples |
| Language | Arabic (Modern Standard + Classical) |
| Domains | Fiqh, Fatwa, Aqeedah, Quran Sciences, Islamic Finance |
| Quality filter | Applied — deduplication, length filtering, domain relevance scoring |
| Format | Instruction-following (system / user / assistant) |

### Hyperparameters

| Hyperparameter | Value |
|---|---|
| Epochs | 3 |
| Per-device batch size | 1 |
| Gradient accumulation steps | 16 |
| Effective batch size | 64 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine with warmup |
| Warmup ratio | 0.05 |
| Max sequence length | 1,024 tokens |
| Optimizer | AdamW (paged, 8-bit) |
| Precision | QLoRA (4-bit base + BF16 adapters) |
| Gradient checkpointing | Enabled |
| Loss masking | Assistant turns only (`assistant_only_loss=True`) |

### LoRA Configuration

| Parameter | Value |
|---|---|
| Rank (r) | 64 |
| Alpha (α) | 128 |
| Dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable parameters | 132,120,576 |
| % of total parameters | 5.65% of 4.15B |

### Results

| Metric | Value |
|---|---|
| Final train loss | 1.8918 |
| Best eval loss | 2.4094 |
| Total training steps | 843 |
| Training duration | 7.59 hours |
| Token accuracy (start → end) | ~50% → ~60% |
| MCQ benchmark | Coming soon (requires vLLM serving) |

### Hardware

| Component | Specification |
|---|---|
| GPUs | 4× NVIDIA GeForce RTX 2080 Ti (11 GB VRAM each, 44 GB total) |
| CUDA version | 13.0 |
| Training framework | DDP via Hugging Face Accelerate |

### Software Environment

| Library | Version |
|---|---|
| Python | 3.11.15 |
| PyTorch | 2.11.0+cu130 |
| Transformers | 4.57.6 |
| PEFT | 0.18.1 |
| TRL | 1.3.0 |
| BitsAndBytes | 0.49.2 |
| Accelerate | 1.13.0 |

---

## Limitations

- **Domain scope**: The model is optimized for Islamic Arabic Q&A. General Arabic tasks or non-Islamic domains may show degraded quality compared to the base Qwen3-4B.
- **Source attribution**: While the model is trained to cite sources, citations should be independently verified — the model can produce plausible-sounding but incorrect references.
- **Classical vs. contemporary Fiqh**: The training data emphasizes classical scholarship. Contemporary jurisprudential debates, especially minority or regional opinions, may be underrepresented.
- **Language**: The model performs best in Arabic (Modern Standard and Classical). Responses in other languages are not guaranteed to be accurate or fluent.

---

## Citation

```bibtex
@misc{alnwsany2026qwen3islamicarbic,
  author       = {Yahya Alnwsany},
  title        = {Qwen3-4B-Islamic-Arabic: QLoRA Fine-Tuning of Qwen3-4B on Islamic Arabic Q\&A},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic}},
  note         = {Base model: Qwen/Qwen3-4B. Dataset: NightPrince/islamic-arabic-qa.}
}
```

---

## License

This model is released under the **Apache 2.0** license, consistent with the base model [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B). See [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details.