初始化项目,由ModelHub XC社区提供模型
Model: junaid008/qehwa-pashto-llm Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
455
README.md
Normal file
455
README.md
Normal file
@@ -0,0 +1,455 @@
|
|||||||
|
---
|
||||||
|
language:
|
||||||
|
- ps
|
||||||
|
- en
|
||||||
|
- ur
|
||||||
|
license: apache-2.0
|
||||||
|
library_name: transformers
|
||||||
|
tags:
|
||||||
|
- pashto
|
||||||
|
- peshawari
|
||||||
|
- pakistani-pashto
|
||||||
|
- causal-lm
|
||||||
|
- qwen2
|
||||||
|
- sft
|
||||||
|
- cpt
|
||||||
|
- unsloth
|
||||||
|
- trl
|
||||||
|
base_model: Qwen/Qwen2.5-7B
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
---
|
||||||
|
|
||||||
|
# ☕ Qehwa — Pashto's First LLM
|
||||||
|
|
||||||
|
**The first and best Pakistani Pashto large language model — specifically trained on Peshawari dialect.**
|
||||||
|
|
||||||
|
Built by a solo developer as a free and open resource for 60+ million Pashto speakers worldwide.
|
||||||
|
|
||||||
|
> ⚠️ This model performs best on Pakistani/Peshawari Pashto. Performance may be lower on Afghan Pashto dialect.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🌟 Model Description
|
||||||
|
|
||||||
|
**Qehwa** is a fully instruction-tuned Pashto language model built on top of Qwen2.5-7B. It is the result of two-stage training:
|
||||||
|
|
||||||
|
1. **Continued Pre-Training (CPT)** on 3.4 million clean Pakistani Pashto documents
|
||||||
|
2. **Supervised Fine-Tuning (SFT)** on 126,519 high-quality Peshawari Pashto instruction-response pairs
|
||||||
|
|
||||||
|
This is the **first dedicated Pakistani Pashto LLM** — no comparable model exists publicly. It specifically targets the **Peshawari/KPK dialect** rather than generic or Afghan Pashto.
|
||||||
|
|
||||||
|
This repo contains the **fully merged model** — ready to use with standard transformers, no additional libraries required.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ✨ Capabilities
|
||||||
|
|
||||||
|
- ✅ Answers questions in pure Peshawari Pashto
|
||||||
|
- ✅ Responds to English instructions in Pashto
|
||||||
|
- ✅ Responds to Urdu instructions in Pashto
|
||||||
|
- ✅ Natural Pashto conversation
|
||||||
|
- ✅ Pashto creative writing and poetry
|
||||||
|
- ✅ Islamic topics in Pashto
|
||||||
|
- ✅ KPK history, culture, and geography
|
||||||
|
- ✅ Pashtunwali traditions and ethics
|
||||||
|
- ✅ Pashto grammar correction
|
||||||
|
- ✅ English to Pashto translation
|
||||||
|
- ✅ Correct Pashto-specific characters: ښ ږ ټ ډ ړ ځ
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 Evaluation Results
|
||||||
|
|
||||||
|
Qehwa was evaluated on a custom benchmark of **150 tests across 15 categories** — the first ever comprehensive Pashto LLM benchmark. Since no standard Pashto benchmark exists publicly, this evaluation was designed specifically for Pakistani Pashto.
|
||||||
|
|
||||||
|
### Top Performing Categories
|
||||||
|
|
||||||
|
| Category | Score |
|
||||||
|
|---|---|
|
||||||
|
| English → Pashto | **90%** 🔥🔥 |
|
||||||
|
| Urdu → Pashto | **84%** 🔥🔥 |
|
||||||
|
| Health & Daily Life in Pashto | **90%** 🔥🔥 |
|
||||||
|
| Culture & History | **90%** 🔥 |
|
||||||
|
| Geography & Nature | **90%** 🔥 |
|
||||||
|
|
||||||
|
> **Overall Average Accuracy across all 15 benchmark categories: 85.3%**
|
||||||
|
|
||||||
|
### Evaluation Methodology
|
||||||
|
- 150 custom Pashto prompts across 15 categories
|
||||||
|
- Evaluated on A100 40GB GPU
|
||||||
|
- Human reviewed outputs for fluency, accuracy and dialect correctness
|
||||||
|
- No existing Pashto benchmark was available — this is the first Pashto LLM benchmark
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 💻 Installation
|
||||||
|
```bash
|
||||||
|
pip install transformers accelerate torch
|
||||||
|
```
|
||||||
|
|
||||||
|
For faster inference:
|
||||||
|
```bash
|
||||||
|
pip install unsloth
|
||||||
|
```
|
||||||
|
|
||||||
|
For running locally on CPU or small GPU:
|
||||||
|
```bash
|
||||||
|
pip install transformers accelerate bitsandbytes
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀 How to Use
|
||||||
|
|
||||||
|
### ✅ Method 1 — Transformers (Recommended)
|
||||||
|
|
||||||
|
Best for: Research, production, standard usage
|
||||||
|
```python
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
import torch
|
||||||
|
|
||||||
|
model_name = "junaid008/qehwa-pashto-llm"
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
model_name,
|
||||||
|
torch_dtype = torch.bfloat16,
|
||||||
|
device_map = "auto",
|
||||||
|
)
|
||||||
|
|
||||||
|
ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.
|
||||||
|
|
||||||
|
### Instruction:
|
||||||
|
{}
|
||||||
|
|
||||||
|
### Response:
|
||||||
|
{}"""
|
||||||
|
|
||||||
|
def generate(prompt):
|
||||||
|
inputs = tokenizer(
|
||||||
|
ALPACA_TEMPLATE.format(prompt, ""),
|
||||||
|
return_tensors = "pt",
|
||||||
|
).to("cuda")
|
||||||
|
|
||||||
|
outputs = model.generate(
|
||||||
|
**inputs,
|
||||||
|
max_new_tokens = 500,
|
||||||
|
temperature = 0.7,
|
||||||
|
do_sample = True,
|
||||||
|
repetition_penalty = 1.1,
|
||||||
|
pad_token_id = tokenizer.eos_token_id,
|
||||||
|
)
|
||||||
|
|
||||||
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||||
|
return response.split("### Response:")[-1].strip()
|
||||||
|
|
||||||
|
# Pashto input
|
||||||
|
print(generate("د پیښور تاریخ راته ووایه"))
|
||||||
|
|
||||||
|
# English input
|
||||||
|
print(generate("Tell me about Pashtunwali"))
|
||||||
|
|
||||||
|
# Urdu input
|
||||||
|
print(generate("پشاور کے بارے میں بتاؤ"))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### ✅ Method 2 — 4-bit Quantization (Low VRAM)
|
||||||
|
|
||||||
|
Best for: GPUs with 8GB VRAM or less
|
||||||
|
```python
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
||||||
|
import torch
|
||||||
|
|
||||||
|
model_name = "junaid008/qehwa-pashto-llm"
|
||||||
|
|
||||||
|
bnb_config = BitsAndBytesConfig(
|
||||||
|
load_in_4bit = True,
|
||||||
|
bnb_4bit_quant_type = "nf4",
|
||||||
|
bnb_4bit_compute_dtype = torch.bfloat16,
|
||||||
|
bnb_4bit_use_double_quant = True,
|
||||||
|
)
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
model_name,
|
||||||
|
quantization_config = bnb_config,
|
||||||
|
device_map = "auto",
|
||||||
|
)
|
||||||
|
|
||||||
|
ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.
|
||||||
|
|
||||||
|
### Instruction:
|
||||||
|
{}
|
||||||
|
|
||||||
|
### Response:
|
||||||
|
{}"""
|
||||||
|
|
||||||
|
def generate(prompt):
|
||||||
|
inputs = tokenizer(
|
||||||
|
ALPACA_TEMPLATE.format(prompt, ""),
|
||||||
|
return_tensors = "pt",
|
||||||
|
).to("cuda")
|
||||||
|
|
||||||
|
outputs = model.generate(
|
||||||
|
**inputs,
|
||||||
|
max_new_tokens = 500,
|
||||||
|
temperature = 0.7,
|
||||||
|
do_sample = True,
|
||||||
|
repetition_penalty = 1.1,
|
||||||
|
pad_token_id = tokenizer.eos_token_id,
|
||||||
|
)
|
||||||
|
|
||||||
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||||
|
return response.split("### Response:")[-1].strip()
|
||||||
|
|
||||||
|
print(generate("پښتونولي تشریح کړه"))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### ✅ Method 3 — Unsloth (2x Faster Inference)
|
||||||
|
|
||||||
|
Best for: Speed-optimized usage, Colab, A100/H100
|
||||||
|
```python
|
||||||
|
from unsloth import FastLanguageModel
|
||||||
|
|
||||||
|
model, tokenizer = FastLanguageModel.from_pretrained(
|
||||||
|
model_name = "junaid008/qehwa-pashto-llm",
|
||||||
|
max_seq_length = 2048,
|
||||||
|
dtype = None,
|
||||||
|
load_in_4bit = False,
|
||||||
|
)
|
||||||
|
FastLanguageModel.for_inference(model)
|
||||||
|
|
||||||
|
ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.
|
||||||
|
|
||||||
|
### Instruction:
|
||||||
|
{}
|
||||||
|
|
||||||
|
### Response:
|
||||||
|
{}"""
|
||||||
|
|
||||||
|
import torch
|
||||||
|
inputs = tokenizer(
|
||||||
|
ALPACA_TEMPLATE.format("د پیښور تاریخ راته ووایه", ""),
|
||||||
|
return_tensors = "pt",
|
||||||
|
).to("cuda")
|
||||||
|
|
||||||
|
outputs = model.generate(
|
||||||
|
**inputs,
|
||||||
|
max_new_tokens = 500,
|
||||||
|
temperature = 0.7,
|
||||||
|
do_sample = True,
|
||||||
|
repetition_penalty = 1.1,
|
||||||
|
pad_token_id = tokenizer.pad_token_id,
|
||||||
|
)
|
||||||
|
|
||||||
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||||
|
print(response.split("### Response:")[-1].strip())
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### ✅ Method 4 — CPU Only (No GPU)
|
||||||
|
|
||||||
|
Best for: Testing on laptop, no GPU available (slow but works)
|
||||||
|
```python
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
import torch
|
||||||
|
|
||||||
|
model_name = "junaid008/qehwa-pashto-llm"
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
model_name,
|
||||||
|
torch_dtype = torch.float32, # float32 for CPU
|
||||||
|
device_map = "cpu",
|
||||||
|
)
|
||||||
|
|
||||||
|
ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.
|
||||||
|
|
||||||
|
### Instruction:
|
||||||
|
{}
|
||||||
|
|
||||||
|
### Response:
|
||||||
|
{}"""
|
||||||
|
|
||||||
|
inputs = tokenizer(
|
||||||
|
ALPACA_TEMPLATE.format("پښتو ژبه د چا ده؟", ""),
|
||||||
|
return_tensors = "pt",
|
||||||
|
)
|
||||||
|
|
||||||
|
outputs = model.generate(
|
||||||
|
**inputs,
|
||||||
|
max_new_tokens = 200,
|
||||||
|
do_sample = False, # greedy for CPU speed
|
||||||
|
pad_token_id = tokenizer.eos_token_id,
|
||||||
|
)
|
||||||
|
|
||||||
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||||
|
print(response.split("### Response:")[-1].strip())
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### ✅ Method 5 — Google Colab (Free)
|
||||||
|
|
||||||
|
Best for: Trying without any local setup
|
||||||
|
|
||||||
|
Open in Colab and run:
|
||||||
|
```python
|
||||||
|
# Install
|
||||||
|
!pip install transformers accelerate -q
|
||||||
|
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
import torch
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("junaid008/qehwa-pashto-llm")
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
"junaid008/qehwa-pashto-llm",
|
||||||
|
torch_dtype = torch.bfloat16,
|
||||||
|
device_map = "auto",
|
||||||
|
)
|
||||||
|
|
||||||
|
ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.
|
||||||
|
|
||||||
|
### Instruction:
|
||||||
|
{}
|
||||||
|
|
||||||
|
### Response:
|
||||||
|
{}"""
|
||||||
|
|
||||||
|
def generate(prompt):
|
||||||
|
inputs = tokenizer(ALPACA_TEMPLATE.format(prompt, ""), return_tensors="pt").to("cuda")
|
||||||
|
outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7,
|
||||||
|
do_sample=True, pad_token_id=tokenizer.eos_token_id)
|
||||||
|
return tokenizer.decode(outputs[0], skip_special_tokens=True).split("### Response:")[-1].strip()
|
||||||
|
|
||||||
|
print(generate("Tell me about Peshawar"))
|
||||||
|
print(generate("پښتونولي تشریح کړه"))
|
||||||
|
print(generate("پشاور کا مشہور کھانا کیا ہے؟"))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚙️ Hardware Requirements
|
||||||
|
|
||||||
|
| Method | VRAM | Speed |
|
||||||
|
|---|---|---|
|
||||||
|
| bfloat16 full | 16GB+ | ✅ Fast |
|
||||||
|
| 4-bit quantized | 8GB+ | ✅ Good |
|
||||||
|
| Unsloth | 16GB+ | 🔥 2x Faster |
|
||||||
|
| CPU only | No GPU | ⚠️ Slow |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 Training Details
|
||||||
|
|
||||||
|
### Stage 1 — Continued Pre-Training (CPT)
|
||||||
|
|
||||||
|
| Parameter | Value |
|
||||||
|
|---|---|
|
||||||
|
| Base model | Qwen/Qwen2.5-7B |
|
||||||
|
| Hardware | NVIDIA A100-SXM4-40GB |
|
||||||
|
| Training steps | 5,000 |
|
||||||
|
| Final CPT loss | ~1.8 |
|
||||||
|
| Dataset size | 3,400,000 documents |
|
||||||
|
| Sequence length | 2,048 tokens |
|
||||||
|
| Precision | bfloat16 |
|
||||||
|
| LoRA rank | 64 |
|
||||||
|
| Learning rate | 5e-5 |
|
||||||
|
| Effective batch size | 32 |
|
||||||
|
|
||||||
|
### Stage 2 — Supervised Fine-Tuning (SFT)
|
||||||
|
|
||||||
|
| Parameter | Value |
|
||||||
|
|---|---|
|
||||||
|
| Base model | junaid008/pashto-qwen2.5-7b-v3 (CPT) |
|
||||||
|
| Hardware | NVIDIA A100-SXM4-40GB |
|
||||||
|
| Training steps | 7,908 |
|
||||||
|
| Final SFT loss | 0.455 |
|
||||||
|
| Dataset size | 126,519 pairs |
|
||||||
|
| Epochs | 2 |
|
||||||
|
| Sequence length | 2,048 tokens |
|
||||||
|
| Precision | bfloat16 |
|
||||||
|
| LoRA rank | 64 |
|
||||||
|
| Learning rate | 5e-5 |
|
||||||
|
| Effective batch size | 32 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📚 Dataset
|
||||||
|
|
||||||
|
### CPT Dataset
|
||||||
|
- 3.4 million Pakistani Pashto documents
|
||||||
|
- Sources: news, books, religious texts, Wikipedia, web crawl
|
||||||
|
- Custom cleaned with Pashto-specific Unicode normalization
|
||||||
|
|
||||||
|
### SFT Dataset — 126,519 pairs:
|
||||||
|
|
||||||
|
| Type | Description |
|
||||||
|
|---|---|
|
||||||
|
| Pashto → Pashto | Questions and answers in pure Peshawari Pashto |
|
||||||
|
| English → Pashto | English instructions with Pashto responses |
|
||||||
|
| Urdu → Pashto | Urdu instructions with Pashto responses |
|
||||||
|
| Conversation | Natural dialogue in Peshawari dialect |
|
||||||
|
| Islamic topics | Religious knowledge in Pashto |
|
||||||
|
| Creative writing | Pashto poetry, stories, descriptions |
|
||||||
|
| Grammar | Pashto language correction and explanation |
|
||||||
|
| Translation | English to Pashto translation pairs |
|
||||||
|
| Cultural | Pashtunwali, traditions, history, geography |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🗣️ Example Outputs
|
||||||
|
|
||||||
|
**Pashto input:**
|
||||||
|
```
|
||||||
|
Prompt : د پیښور تاریخ راته ووایه
|
||||||
|
Output : پیښور د نړۍ یو له تر ټولو پخوانیو ښارونو څخه دی چې تاریخ یې زرګونه
|
||||||
|
کاله پخوا ته رسیږي. د دې ښار پخوانی نوم پوشکلاوتي او بیا پروشاپور و.
|
||||||
|
د پیښور د بالاحصار کلا د مغلو او سکهانو د وخت یادګار دی...
|
||||||
|
```
|
||||||
|
|
||||||
|
**English input:**
|
||||||
|
```
|
||||||
|
Prompt : Tell me about Pashtunwali
|
||||||
|
Output : پښتونولي د پښتنو د ژوند هغه لرغونی او سپېڅلی ضابطه اخلاق ده چې د دوی
|
||||||
|
ټولنیز ژوند یې تنظیم کړی دی. په دې کې مېلمه پالنه، غیرت، او د کمزورو
|
||||||
|
ساتنه تر ټولو لوی ارزښتونه ګڼل کېږي...
|
||||||
|
```
|
||||||
|
|
||||||
|
**Urdu input:**
|
||||||
|
```
|
||||||
|
Prompt : پشتونوالی کیا ہے؟
|
||||||
|
Output : پښتونولي د پښتنو د ژوند هغه لرغونی او زرین ضابطه اخلاق ده...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 👤 Author
|
||||||
|
|
||||||
|
**Junaid Aslam**
|
||||||
|
- HuggingFace: [junaid008](https://huggingface.co/junaid008)
|
||||||
|
- Built independently as a contribution to Pashto NLP
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📜 License
|
||||||
|
|
||||||
|
Apache 2.0 — free to use, modify, and distribute with attribution.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🤝 Citation
|
||||||
|
```bibtex
|
||||||
|
@misc{qehwa-pashto-llm,
|
||||||
|
author = {Junaid Aslam},
|
||||||
|
title = {Qehwa — Pashto's First LLM},
|
||||||
|
year = {2026},
|
||||||
|
publisher = {HuggingFace},
|
||||||
|
url = {https://huggingface.co/junaid008/qehwa-pashto-llm}
|
||||||
|
}
|
||||||
|
```
|
||||||
64
config.json
Normal file
64
config.json
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"Qwen2ForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": null,
|
||||||
|
"dtype": "bfloat16",
|
||||||
|
"eos_token_id": 151643,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 3584,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 18944,
|
||||||
|
"layer_types": [
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention"
|
||||||
|
],
|
||||||
|
"max_position_embeddings": 131072,
|
||||||
|
"max_window_layers": 28,
|
||||||
|
"model_type": "qwen2",
|
||||||
|
"num_attention_heads": 28,
|
||||||
|
"num_hidden_layers": 28,
|
||||||
|
"num_key_value_heads": 4,
|
||||||
|
"pad_token_id": 151665,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_parameters": {
|
||||||
|
"rope_theta": 1000000.0,
|
||||||
|
"rope_type": "default"
|
||||||
|
},
|
||||||
|
"sliding_window": null,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"transformers_version": "5.2.0",
|
||||||
|
"unsloth_fixed": true,
|
||||||
|
"unsloth_version": "2026.3.4",
|
||||||
|
"use_cache": false,
|
||||||
|
"use_mrope": false,
|
||||||
|
"use_sliding_window": false,
|
||||||
|
"vocab_size": 152064
|
||||||
|
}
|
||||||
9
generation_config.json
Normal file
9
generation_config.json
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
{
|
||||||
|
"eos_token_id": [
|
||||||
|
151643
|
||||||
|
],
|
||||||
|
"max_length": 131072,
|
||||||
|
"max_new_tokens": 2048,
|
||||||
|
"pad_token_id": 151665,
|
||||||
|
"transformers_version": "5.2.0"
|
||||||
|
}
|
||||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:1b14653034d5866af47bde6859adb272c70eb2475ff742a914bde1cd4287f39e
|
||||||
|
size 15231272152
|
||||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:bd5948af71b4f56cf697f7580814c7ce8b80595ef985544efcacf716126a2e31
|
||||||
|
size 11422356
|
||||||
15
tokenizer_config.json
Normal file
15
tokenizer_config.json
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
{
|
||||||
|
"add_prefix_space": false,
|
||||||
|
"backend": "tokenizers",
|
||||||
|
"bos_token": null,
|
||||||
|
"clean_up_tokenization_spaces": false,
|
||||||
|
"eos_token": "<|endoftext|>",
|
||||||
|
"errors": "replace",
|
||||||
|
"is_local": false,
|
||||||
|
"model_max_length": 131072,
|
||||||
|
"pad_token": "<|PAD_TOKEN|>",
|
||||||
|
"padding_side": "left",
|
||||||
|
"split_special_tokens": false,
|
||||||
|
"tokenizer_class": "Qwen2Tokenizer",
|
||||||
|
"unk_token": null
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user