132 lines
3.8 KiB
Markdown
132 lines
3.8 KiB
Markdown
|
|
---
|
||
|
|
language:
|
||
|
|
- sw
|
||
|
|
- en
|
||
|
|
license: apache-2.0
|
||
|
|
base_model:
|
||
|
|
- google/gemma-2-2b
|
||
|
|
library_name: transformers
|
||
|
|
---
|
||
|
|
|
||
|
|
# PAWA: Swahili SML for Various Tasks
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
**PAWA** is a Swahili-specialized language model designed to excel in tasks requiring nuanced understanding and interaction in Swahili and English. It leverages supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) for improved performance and consistency. Below are the detailed model specifications, installation steps, usage examples, and its intended applications.
|
||
|
|
|
||
|
|
---
|
||
|
|
### Model Details
|
||
|
|
|
||
|
|
- **Model Name**: Pawa-Gemma-Swahili-2B
|
||
|
|
- **Model Type**: PAWA
|
||
|
|
- **Architecture**:
|
||
|
|
- 2B Parameter Gemma-2 Base Model
|
||
|
|
- Enhanced with Swahili SFT and DPO datasets.
|
||
|
|
- **Languages Supported**:
|
||
|
|
- Swahili
|
||
|
|
- English
|
||
|
|
- Custom tokenizer for multi-language flexibility.
|
||
|
|
- **Primary Use Cases**:
|
||
|
|
- Contextually rich Swahili-focused tasks.
|
||
|
|
- General assistance and chat-based interactions.
|
||
|
|
- **License**: Custom/Contact Author for terms of use.
|
||
|
|
|
||
|
|
---
|
||
|
|
### Installation and Setup
|
||
|
|
Ensure the necessary libraries are installed and up-to-date:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
!pip uninstall transformers -y && pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git"
|
||
|
|
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
|
||
|
|
!pip install datasets
|
||
|
|
```
|
||
|
|
---
|
||
|
|
### Model Loading
|
||
|
|
You can load the model using the following code snippet:
|
||
|
|
|
||
|
|
```python
|
||
|
|
from unsloth import FastLanguageModel
|
||
|
|
import torch
|
||
|
|
|
||
|
|
model_name = "sartifyllc/Pawa-kaggle-gemma-2b"
|
||
|
|
max_seq_length = 2048
|
||
|
|
dtype = None
|
||
|
|
load_in_4bit = False
|
||
|
|
|
||
|
|
model, tokenizer = FastLanguageModel.from_pretrained(
|
||
|
|
model_name=model_name,
|
||
|
|
max_seq_length=max_seq_length,
|
||
|
|
dtype=dtype,
|
||
|
|
load_in_4bit=load_in_4bit,
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
### Chat Template Configuration
|
||
|
|
For a seamless conversational experience, configure the tokenizer with the appropriate chat template:
|
||
|
|
```python
|
||
|
|
from unsloth.chat_templates import get_chat_template
|
||
|
|
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
|
||
|
|
|
||
|
|
tokenizer = get_chat_template(
|
||
|
|
tokenizer,
|
||
|
|
chat_template="chatml", # Supports templates like zephyr, chatml, mistral, etc.
|
||
|
|
mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, # ShareGPT style
|
||
|
|
map_eos_token=True, # Maps <|im_end|> to </s>
|
||
|
|
)
|
||
|
|
```
|
||
|
|
---
|
||
|
|
### Usage Example
|
||
|
|
Generate a short story in Swahili:
|
||
|
|
|
||
|
|
```python
|
||
|
|
messages = [{"from": "human", "value": "Tengeneza hadithi fupi"}]
|
||
|
|
inputs = tokenizer.apply_chat_template(
|
||
|
|
messages,
|
||
|
|
tokenize=True,
|
||
|
|
add_generation_prompt=True,
|
||
|
|
return_tensors="pt",
|
||
|
|
).to("cuda")
|
||
|
|
|
||
|
|
from transformers import TextStreamer
|
||
|
|
text_streamer = TextStreamer(tokenizer)
|
||
|
|
_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True)
|
||
|
|
```
|
||
|
|
---
|
||
|
|
### Training and Fine-Tuning Details
|
||
|
|
|
||
|
|
- **Base Model**: Gemma-2-2B
|
||
|
|
- **Continue Pre-Training**: 3B Swahili Tokens
|
||
|
|
- **Fine-tuning**: Enhanced with Swahili SFT datasets for improved contextual understanding.
|
||
|
|
- **Optimization**: Includes DPO for deterministic and consistent responses.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Intended Use Cases
|
||
|
|
|
||
|
|
- **General Assistance**:
|
||
|
|
Provides structured answers for general-purpose use.
|
||
|
|
|
||
|
|
- **Interactive Q&A**:
|
||
|
|
Designed for general-purpose chat environments.
|
||
|
|
|
||
|
|
- **RAG (Retrieval-Augmented Generation)**:
|
||
|
|
Works best for RAG and specific use cases.
|
||
|
|
|
||
|
|
---
|
||
|
|
### Limitations
|
||
|
|
|
||
|
|
- **Biases**:
|
||
|
|
The model may exhibit biases inherent in its fine-tuning datasets.
|
||
|
|
|
||
|
|
- **Generalization**:
|
||
|
|
May struggle with tasks outside the trained domain.
|
||
|
|
|
||
|
|
- **Hardware Requirements**:
|
||
|
|
- Optimal performance requires GPUs with high memory (e.g., Tesla V100 or T4).
|
||
|
|
- Supports 4-bit quantization for reduced memory usage.
|
||
|
|
|
||
|
|
|
||
|
|
Feel free to reach out for further guidance or collaboration opportunities regarding PAWA!
|