Pawa-Gemma-Swahili-2B/README.md

---
language:
- sw
- en
license: apache-2.0
base_model:
- google/gemma-2-2b
library_name: transformers
---

# PAWA: Swahili SML for Various Tasks

---

## Overview

**PAWA** is a Swahili-specialized language model designed to excel in tasks requiring nuanced understanding and interaction in Swahili and English. It leverages supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) for improved performance and consistency. Below are the detailed model specifications, installation steps, usage examples, and its intended applications.

---
### Model Details

- **Model Name**: Pawa-Gemma-Swahili-2B
- **Model Type**: PAWA  
- **Architecture**:  
  - 2B Parameter Gemma-2 Base Model  
  - Enhanced with Swahili SFT and DPO datasets.  
- **Languages Supported**:  
  - Swahili  
  - English  
  - Custom tokenizer for multi-language flexibility.  
- **Primary Use Cases**:  
  - Contextually rich Swahili-focused tasks.  
  - General assistance and chat-based interactions.  
- **License**: Custom/Contact Author for terms of use.  

---
### Installation and Setup
Ensure the necessary libraries are installed and up-to-date:

```bash
!pip uninstall transformers -y && pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git"
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install datasets
```
---
### Model Loading
You can load the model using the following code snippet:

```python
from unsloth import FastLanguageModel
import torch

model_name = "sartifyllc/Pawa-kaggle-gemma-2b"
max_seq_length = 2048  
dtype = None  
load_in_4bit = False  

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)
```

---
### Chat Template Configuration
For a seamless conversational experience, configure the tokenizer with the appropriate chat template:
```python
from unsloth.chat_templates import get_chat_template
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

tokenizer = get_chat_template(
    tokenizer,
    chat_template="chatml",  # Supports templates like zephyr, chatml, mistral, etc.
    mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},  # ShareGPT style
    map_eos_token=True,  # Maps <|im_end|> to </s>
)
```
---
### Usage Example
Generate a short story in Swahili:

```python
messages = [{"from": "human", "value": "Tengeneza hadithi fupi"}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True)
```
---
### Training and Fine-Tuning Details

- **Base Model**: Gemma-2-2B
- **Continue Pre-Training**: 3B Swahili Tokens
- **Fine-tuning**: Enhanced with Swahili SFT datasets for improved contextual understanding.  
- **Optimization**: Includes DPO for deterministic and consistent responses.  

---

### Intended Use Cases

- **General Assistance**:  
  Provides structured answers for general-purpose use.  

- **Interactive Q&A**:  
  Designed for general-purpose chat environments.  

- **RAG (Retrieval-Augmented Generation)**:  
  Works best for RAG and specific use cases.

---
### Limitations

- **Biases**:  
  The model may exhibit biases inherent in its fine-tuning datasets.

- **Generalization**:  
  May struggle with tasks outside the trained domain.

- **Hardware Requirements**:  
  - Optimal performance requires GPUs with high memory (e.g., Tesla V100 or T4).  
  - Supports 4-bit quantization for reduced memory usage.


Feel free to reach out for further guidance or collaboration opportunities regarding PAWA!
初始化项目，由ModelHub XC社区提供模型 Model: sartifyllc/Pawa-Gemma-Swahili-2B Source: Original Platform 2026-05-02 05:32:04 +08:00			`---`
			`language:`
			`- sw`
			`- en`
			`license: apache-2.0`
			`base_model:`
			`- google/gemma-2-2b`
			`library_name: transformers`
			`---`

			`# PAWA: Swahili SML for Various Tasks`

			`---`

			`## Overview`

			`PAWA is a Swahili-specialized language model designed to excel in tasks requiring nuanced understanding and interaction in Swahili and English. It leverages supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) for improved performance and consistency. Below are the detailed model specifications, installation steps, usage examples, and its intended applications.`

			`---`
			`### Model Details`

			`- Model Name: Pawa-Gemma-Swahili-2B`
			`- Model Type: PAWA`
			`- Architecture:`
			`- 2B Parameter Gemma-2 Base Model`
			`- Enhanced with Swahili SFT and DPO datasets.`
			`- Languages Supported:`
			`- Swahili`
			`- English`
			`- Custom tokenizer for multi-language flexibility.`
			`- Primary Use Cases:`
			`- Contextually rich Swahili-focused tasks.`
			`- General assistance and chat-based interactions.`
			`- License: Custom/Contact Author for terms of use.`

			`---`
			`### Installation and Setup`
			`Ensure the necessary libraries are installed and up-to-date:`

			```bash
			`!pip uninstall transformers -y && pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git"`
			`!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"`
			`!pip install datasets`
			```
			`---`
			`### Model Loading`
			`You can load the model using the following code snippet:`

			```python
			`from unsloth import FastLanguageModel`
			`import torch`

			`model_name = "sartifyllc/Pawa-kaggle-gemma-2b"`
			`max_seq_length = 2048`
			`dtype = None`
			`load_in_4bit = False`

			`model, tokenizer = FastLanguageModel.from_pretrained(`
			`model_name=model_name,`
			`max_seq_length=max_seq_length,`
			`dtype=dtype,`
			`load_in_4bit=load_in_4bit,`
			`)`
			```

			`---`
			`### Chat Template Configuration`
			`For a seamless conversational experience, configure the tokenizer with the appropriate chat template:`
			```python
			`from unsloth.chat_templates import get_chat_template`
			`FastLanguageModel.for_inference(model) # Enable native 2x faster inference`

			`tokenizer = get_chat_template(`
			`tokenizer,`
			`chat_template="chatml", # Supports templates like zephyr, chatml, mistral, etc.`
			`mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, # ShareGPT style`
			`map_eos_token=True, # Maps <\|im_end\|> to </s>`
			`)`
			```
			`---`
			`### Usage Example`
			`Generate a short story in Swahili:`

			```python
			`messages = [{"from": "human", "value": "Tengeneza hadithi fupi"}]`
			`inputs = tokenizer.apply_chat_template(`
			`messages,`
			`tokenize=True,`
			`add_generation_prompt=True,`
			`return_tensors="pt",`
			`).to("cuda")`

			`from transformers import TextStreamer`
			`text_streamer = TextStreamer(tokenizer)`
			`_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True)`
			```
			`---`
			`### Training and Fine-Tuning Details`

			`- Base Model: Gemma-2-2B`
			`- Continue Pre-Training: 3B Swahili Tokens`
			`- Fine-tuning: Enhanced with Swahili SFT datasets for improved contextual understanding.`
			`- Optimization: Includes DPO for deterministic and consistent responses.`

			`---`

			`### Intended Use Cases`

			`- General Assistance:`
			`Provides structured answers for general-purpose use.`

			`- Interactive Q&A:`
			`Designed for general-purpose chat environments.`

			`- RAG (Retrieval-Augmented Generation):`
			`Works best for RAG and specific use cases.`

			`---`
			`### Limitations`

			`- Biases:`
			`The model may exhibit biases inherent in its fine-tuning datasets.`

			`- Generalization:`
			`May struggle with tasks outside the trained domain.`

			`- Hardware Requirements:`
			`- Optimal performance requires GPUs with high memory (e.g., Tesla V100 or T4).`
			`- Supports 4-bit quantization for reduced memory usage.`


			`Feel free to reach out for further guidance or collaboration opportunities regarding PAWA!`