qehwa-pashto-llm/README.md

---
language:
- ps
- en
- ur
license: apache-2.0
library_name: transformers
tags:
- pashto
- peshawari
- pakistani-pashto
- causal-lm
- qwen2
- sft
- cpt
- unsloth
- trl
base_model: Qwen/Qwen2.5-7B
pipeline_tag: text-generation
---

# ☕ Qehwa — Pashto's First LLM

**The first and best Pakistani Pashto large language model — specifically trained on Peshawari dialect.**

Built by a solo developer as a free and open resource for 60+ million Pashto speakers worldwide.

> ⚠️ This model performs best on Pakistani/Peshawari Pashto. Performance may be lower on Afghan Pashto dialect.

---

## 🌟 Model Description

**Qehwa** is a fully instruction-tuned Pashto language model built on top of Qwen2.5-7B. It is the result of two-stage training:

1. **Continued Pre-Training (CPT)** on 3.4 million clean Pakistani Pashto documents
2. **Supervised Fine-Tuning (SFT)** on 126,519 high-quality Peshawari Pashto instruction-response pairs

This is the **first dedicated Pakistani Pashto LLM** — no comparable model exists publicly. It specifically targets the **Peshawari/KPK dialect** rather than generic or Afghan Pashto.

This repo contains the **fully merged model** — ready to use with standard transformers, no additional libraries required.

---

## ✨ Capabilities

- ✅ Answers questions in pure Peshawari Pashto
- ✅ Responds to English instructions in Pashto
- ✅ Responds to Urdu instructions in Pashto
- ✅ Natural Pashto conversation
- ✅ Pashto creative writing and poetry
- ✅ Islamic topics in Pashto
- ✅ KPK history, culture, and geography
- ✅ Pashtunwali traditions and ethics
- ✅ Pashto grammar correction
- ✅ English to Pashto translation
- ✅ Correct Pashto-specific characters: ښ ږ ټ ډ ړ ځ

---

## 📊 Evaluation Results

Qehwa was evaluated on a custom benchmark of **150 tests across 15 categories** — the first ever comprehensive Pashto LLM benchmark. Since no standard Pashto benchmark exists publicly, this evaluation was designed specifically for Pakistani Pashto.

### Top Performing Categories

| Category | Score |
|---|---|
| English → Pashto | **90%** 🔥🔥 |
| Urdu → Pashto | **84%** 🔥🔥 |
| Health & Daily Life in Pashto | **90%** 🔥🔥 |
| Culture & History | **90%** 🔥 |
| Geography & Nature | **90%** 🔥 |

> **Overall Average Accuracy across all 15 benchmark categories: 85.3%**

### Evaluation Methodology
- 150 custom Pashto prompts across 15 categories
- Evaluated on A100 40GB GPU
- Human reviewed outputs for fluency, accuracy and dialect correctness
- No existing Pashto benchmark was available — this is the first Pashto LLM benchmark

---

## 💻 Installation
```bash
pip install transformers accelerate torch
```

For faster inference:
```bash
pip install unsloth
```

For running locally on CPU or small GPU:
```bash
pip install transformers accelerate bitsandbytes
```

---

## 🚀 How to Use

### ✅ Method 1 — Transformers (Recommended)

Best for: Research, production, standard usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "junaid008/qehwa-pashto-llm"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model     = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype = torch.bfloat16,
    device_map  = "auto",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

def generate(prompt):
    inputs = tokenizer(
        ALPACA_TEMPLATE.format(prompt, ""),
        return_tensors = "pt",
    ).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens     = 500,
        temperature        = 0.7,
        do_sample          = True,
        repetition_penalty = 1.1,
        pad_token_id       = tokenizer.eos_token_id,
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### Response:")[-1].strip()

# Pashto input
print(generate("د پیښور تاریخ راته ووایه"))

# English input
print(generate("Tell me about Pashtunwali"))

# Urdu input
print(generate("پشاور کے بارے میں بتاؤ"))
```

---

### ✅ Method 2 — 4-bit Quantization (Low VRAM)

Best for: GPUs with 8GB VRAM or less
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_name = "junaid008/qehwa-pashto-llm"

bnb_config = BitsAndBytesConfig(
    load_in_4bit              = True,
    bnb_4bit_quant_type       = "nf4",
    bnb_4bit_compute_dtype    = torch.bfloat16,
    bnb_4bit_use_double_quant = True,
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model     = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config = bnb_config,
    device_map          = "auto",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

def generate(prompt):
    inputs = tokenizer(
        ALPACA_TEMPLATE.format(prompt, ""),
        return_tensors = "pt",
    ).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens     = 500,
        temperature        = 0.7,
        do_sample          = True,
        repetition_penalty = 1.1,
        pad_token_id       = tokenizer.eos_token_id,
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### Response:")[-1].strip()

print(generate("پښتونولي تشریح کړه"))
```

---

### ✅ Method 3 — Unsloth (2x Faster Inference)

Best for: Speed-optimized usage, Colab, A100/H100
```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = "junaid008/qehwa-pashto-llm",
    max_seq_length = 2048,
    dtype          = None,
    load_in_4bit   = False,
)
FastLanguageModel.for_inference(model)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

import torch
inputs = tokenizer(
    ALPACA_TEMPLATE.format("د پیښور تاریخ راته ووایه", ""),
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens     = 500,
    temperature        = 0.7,
    do_sample          = True,
    repetition_penalty = 1.1,
    pad_token_id       = tokenizer.pad_token_id,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[-1].strip())
```

---

### ✅ Method 4 — CPU Only (No GPU)

Best for: Testing on laptop, no GPU available (slow but works)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "junaid008/qehwa-pashto-llm"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model     = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype = torch.float32,  # float32 for CPU
    device_map  = "cpu",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

inputs = tokenizer(
    ALPACA_TEMPLATE.format("پښتو ژبه د چا ده؟", ""),
    return_tensors = "pt",
)

outputs = model.generate(
    **inputs,
    max_new_tokens = 200,
    do_sample      = False,   # greedy for CPU speed
    pad_token_id   = tokenizer.eos_token_id,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[-1].strip())
```

---

### ✅ Method 5 — Google Colab (Free)

Best for: Trying without any local setup

Open in Colab and run:
```python
# Install
!pip install transformers accelerate -q

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("junaid008/qehwa-pashto-llm")
model     = AutoModelForCausalLM.from_pretrained(
    "junaid008/qehwa-pashto-llm",
    torch_dtype = torch.bfloat16,
    device_map  = "auto",
)

ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.

### Instruction:
{}

### Response:
{}"""

def generate(prompt):
    inputs  = tokenizer(ALPACA_TEMPLATE.format(prompt, ""), return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7,
                              do_sample=True, pad_token_id=tokenizer.eos_token_id)
    return tokenizer.decode(outputs[0], skip_special_tokens=True).split("### Response:")[-1].strip()

print(generate("Tell me about Peshawar"))
print(generate("پښتونولي تشریح کړه"))
print(generate("پشاور کا مشہور کھانا کیا ہے؟"))
```

---

## ⚙️ Hardware Requirements

| Method | VRAM | Speed |
|---|---|---|
| bfloat16 full | 16GB+ | ✅ Fast |
| 4-bit quantized | 8GB+ | ✅ Good |
| Unsloth | 16GB+ | 🔥 2x Faster |
| CPU only | No GPU | ⚠️ Slow |

---

## 📊 Training Details

### Stage 1 — Continued Pre-Training (CPT)

| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-7B |
| Hardware | NVIDIA A100-SXM4-40GB |
| Training steps | 5,000 |
| Final CPT loss | ~1.8 |
| Dataset size | 3,400,000 documents |
| Sequence length | 2,048 tokens |
| Precision | bfloat16 |
| LoRA rank | 64 |
| Learning rate | 5e-5 |
| Effective batch size | 32 |

### Stage 2 — Supervised Fine-Tuning (SFT)

| Parameter | Value |
|---|---|
| Base model | junaid008/pashto-qwen2.5-7b-v3 (CPT) |
| Hardware | NVIDIA A100-SXM4-40GB |
| Training steps | 7,908 |
| Final SFT loss | 0.455 |
| Dataset size | 126,519 pairs |
| Epochs | 2 |
| Sequence length | 2,048 tokens |
| Precision | bfloat16 |
| LoRA rank | 64 |
| Learning rate | 5e-5 |
| Effective batch size | 32 |

---

## 📚 Dataset

### CPT Dataset
- 3.4 million Pakistani Pashto documents
- Sources: news, books, religious texts, Wikipedia, web crawl
- Custom cleaned with Pashto-specific Unicode normalization

### SFT Dataset — 126,519 pairs:

| Type | Description |
|---|---|
| Pashto → Pashto | Questions and answers in pure Peshawari Pashto |
| English → Pashto | English instructions with Pashto responses |
| Urdu → Pashto | Urdu instructions with Pashto responses |
| Conversation | Natural dialogue in Peshawari dialect |
| Islamic topics | Religious knowledge in Pashto |
| Creative writing | Pashto poetry, stories, descriptions |
| Grammar | Pashto language correction and explanation |
| Translation | English to Pashto translation pairs |
| Cultural | Pashtunwali, traditions, history, geography |

---

## 🗣️ Example Outputs

**Pashto input:**
```
Prompt : د پیښور تاریخ راته ووایه
Output : پیښور د نړۍ یو له تر ټولو پخوانیو ښارونو څخه دی چې تاریخ یې زرګونه
         کاله پخوا ته رسیږي. د دې ښار پخوانی نوم پوشکلاوتي او بیا پروشاپور و.
         د پیښور د بالاحصار کلا د مغلو او سکهانو د وخت یادګار دی...
```

**English input:**
```
Prompt : Tell me about Pashtunwali
Output : پښتونولي د پښتنو د ژوند هغه لرغونی او سپېڅلی ضابطه اخلاق ده چې د دوی
         ټولنیز ژوند یې تنظیم کړی دی. په دې کې مېلمه پالنه، غیرت، او د کمزورو
         ساتنه تر ټولو لوی ارزښتونه ګڼل کېږي...
```

**Urdu input:**
```
Prompt : پشتونوالی کیا ہے؟
Output : پښتونولي د پښتنو د ژوند هغه لرغونی او زرین ضابطه اخلاق ده...
```

---

## 👤 Author

**Junaid Aslam**
- HuggingFace: [junaid008](https://huggingface.co/junaid008)
- Built independently as a contribution to Pashto NLP

---

## 📜 License

Apache 2.0 — free to use, modify, and distribute with attribution.

---

## 🤝 Citation
```bibtex
@misc{qehwa-pashto-llm,
  author    = {Junaid Aslam},
  title     = {Qehwa — Pashto's First LLM},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/junaid008/qehwa-pashto-llm}
}
```