Files
Qwen3-4B-Islamic-Arabic/README.md

289 lines
9.5 KiB
Markdown
Raw Permalink Normal View History

---
language:
- ar
license: apache-2.0
library_name: transformers
base_model: Qwen/Qwen3-4B
datasets:
- NightPrince/islamic-arabic-qa
tags:
- arabic
- islamic
- fiqh
- fatwa
- qlora
- peft
- qwen3
- instruction-tuning
- conversational
pipeline_tag: text-generation
inference:
parameters:
max_new_tokens: 512
temperature: 0.3
do_sample: true
widget:
- text: "ما حكم زكاة الفطر وما مقدارها؟"
example_title: "زكاة الفطر"
- text: "ما الفرق بين الفرض والواجب عند الحنفية؟"
example_title: "الفرض والواجب"
- text: "ما حكم بيع العينة في الفقه الإسلامي؟"
example_title: "بيع العينة"
- text: "ما شروط صحة الصلاة؟"
example_title: "شروط الصلاة"
model-index:
- name: Qwen3-4B-Islamic-Arabic
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: Islamic Arabic Q&A
type: NightPrince/islamic-arabic-qa
split: validation
metrics:
- name: Validation Loss
type: loss
value: 2.4094
verified: false
---
# Qwen3-4B-Islamic-Arabic
**Qwen3-4B fine-tuned on Islamic Arabic Q&A via QLoRA — merged FP16, ready for direct inference.**
This is the canonical, fully merged version of a Qwen3-4B model fine-tuned on 17,944 high-quality Islamic Arabic question-answer pairs spanning Fiqh, Fatwa, Aqeedah, Quran Sciences, and Islamic Finance. The LoRA adapter has been merged into the base weights and saved in FP16; no additional adapter loading is required.
Trained by **[Yahya Alnwsany (NightPrince)](https://huggingface.co/NightPrince)** — 2026-05-05.
---
## Model Variants
| Variant | Repo | Description |
|---|---|---|
| **Merged FP16** (this model) | [NightPrince/Qwen3-4B-Islamic-Arabic](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic) | Canonical merged model, FP16, ~7.6 GB — drop-in for transformers or vLLM |
| **LoRA Adapter** | [NightPrince/Qwen3-4B-Islamic-Arabic-LoRA](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-LoRA) | PEFT adapter only, 264 MB — apply on top of `Qwen/Qwen3-4B` |
| **INT4 Quantized** | [NightPrince/Qwen3-4B-Islamic-Arabic-INT4](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-INT4) | W4A16 compressed-tensors for fast vLLM serving, 2.5 GB |
| **MLX 4-bit** | [NightPrince/Qwen3-4B-Islamic-Arabic-mlx-4Bit](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-mlx-4Bit) | Apple Silicon / MLX — native Mac inference, 4-bit quantized |
| **GGUF** | [NightPrince/Qwen3-4B-Islamic-Arabic-GGUF](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-GGUF) | llama.cpp / Ollama / LM Studio — Q4_K_M (2.3 GB), Q8_0 (4.0 GB), F16 (7.5 GB) |
| **Dataset** | [NightPrince/islamic-arabic-qa](https://huggingface.co/datasets/NightPrince/islamic-arabic-qa) | 17,944 train / 2,101 val / 1,042 test — Islamic Arabic Q&A pairs |
---
## Training Metrics
### Loss Curve
| Checkpoint | Train Loss | Eval Loss |
|---|---|---|
| Step 0 (init) | — | — |
| Step 843 (final) | **1.8918** | **2.4094** (best) |
### Token Accuracy
| Phase | Token Accuracy |
|---|---|
| Early training | ~50% |
| End of training | ~60% |
> **MCQ evaluation coming soon** — a multiple-choice benchmark (Islamics domain) is prepared but requires serving the model via vLLM. Results will be posted here once available.
---
## Usage
### Transformers Inference
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "NightPrince/Qwen3-4B-Islamic-Arabic"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
)
SYSTEM_PROMPT = (
"أنت مساعد عالم إسلامي متخصص. "
"أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. "
"استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً."
)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "ما حكم الزكاة على المال المدخر؟"},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```
### vLLM Serving
The merged FP16 model is ~7.6 GB. Use at least `tensor_parallel_size=2` on 11 GB GPUs (e.g., RTX 2080 Ti), or a single 24 GB+ GPU.
```bash
# Install vLLM if needed
pip install vllm
# Serve with tensor parallelism across 2 GPUs
vllm serve NightPrince/Qwen3-4B-Islamic-Arabic \
--dtype float16 \
--tensor-parallel-size 2 \
--max-model-len 4096 \
--port 8000
```
Query the running server:
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="token-abc123")
SYSTEM_PROMPT = (
"أنت مساعد عالم إسلامي متخصص. "
"أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. "
"استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً."
)
response = client.chat.completions.create(
model="NightPrince/Qwen3-4B-Islamic-Arabic",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "ما حكم الزكاة على المال المدخر؟"},
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)
```
> **Prefer lower memory?** Use the [INT4 quantized variant](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-INT4) (2.5 GB) for vLLM or the [GGUF variant](https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic-GGUF) for llama.cpp / Ollama.
---
## Training Details
### Dataset
| Property | Value |
|---|---|
| Dataset | [NightPrince/islamic-arabic-qa](https://huggingface.co/datasets/NightPrince/islamic-arabic-qa) |
| Train split | 17,944 samples |
| Validation split | 2,101 samples |
| Test split | 1,042 samples |
| Language | Arabic (Modern Standard + Classical) |
| Domains | Fiqh, Fatwa, Aqeedah, Quran Sciences, Islamic Finance |
| Quality filter | Applied — deduplication, length filtering, domain relevance scoring |
| Format | Instruction-following (system / user / assistant) |
### Hyperparameters
| Hyperparameter | Value |
|---|---|
| Epochs | 3 |
| Per-device batch size | 1 |
| Gradient accumulation steps | 16 |
| Effective batch size | 64 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine with warmup |
| Warmup ratio | 0.05 |
| Max sequence length | 1,024 tokens |
| Optimizer | AdamW (paged, 8-bit) |
| Precision | QLoRA (4-bit base + BF16 adapters) |
| Gradient checkpointing | Enabled |
| Loss masking | Assistant turns only (`assistant_only_loss=True`) |
### LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 64 |
| Alpha (α) | 128 |
| Dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable parameters | 132,120,576 |
| % of total parameters | 5.65% of 4.15B |
### Results
| Metric | Value |
|---|---|
| Final train loss | 1.8918 |
| Best eval loss | 2.4094 |
| Total training steps | 843 |
| Training duration | 7.59 hours |
| Token accuracy (start → end) | ~50% → ~60% |
| MCQ benchmark | Coming soon (requires vLLM serving) |
### Hardware
| Component | Specification |
|---|---|
| GPUs | 4× NVIDIA GeForce RTX 2080 Ti (11 GB VRAM each, 44 GB total) |
| CUDA version | 13.0 |
| Training framework | DDP via Hugging Face Accelerate |
### Software Environment
| Library | Version |
|---|---|
| Python | 3.11.15 |
| PyTorch | 2.11.0+cu130 |
| Transformers | 4.57.6 |
| PEFT | 0.18.1 |
| TRL | 1.3.0 |
| BitsAndBytes | 0.49.2 |
| Accelerate | 1.13.0 |
---
## Limitations
- **Domain scope**: The model is optimized for Islamic Arabic Q&A. General Arabic tasks or non-Islamic domains may show degraded quality compared to the base Qwen3-4B.
- **Source attribution**: While the model is trained to cite sources, citations should be independently verified — the model can produce plausible-sounding but incorrect references.
- **Classical vs. contemporary Fiqh**: The training data emphasizes classical scholarship. Contemporary jurisprudential debates, especially minority or regional opinions, may be underrepresented.
- **Language**: The model performs best in Arabic (Modern Standard and Classical). Responses in other languages are not guaranteed to be accurate or fluent.
---
## Citation
```bibtex
@misc{alnwsany2026qwen3islamicarbic,
author = {Yahya Alnwsany},
title = {Qwen3-4B-Islamic-Arabic: QLoRA Fine-Tuning of Qwen3-4B on Islamic Arabic Q\&A},
year = {2026},
howpublished = {\url{https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic}},
note = {Base model: Qwen/Qwen3-4B. Dataset: NightPrince/islamic-arabic-qa.}
}
```
---
## License
This model is released under the **Apache 2.0** license, consistent with the base model [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B). See [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details.