初始化项目,由ModelHub XC社区提供模型

Model: CraneAILabs/swahili-gemma-1b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-17 22:37:29 +08:00
commit 354ce0f3a5
15 changed files with 51895 additions and 0 deletions

38
.gitattributes vendored Normal file
View File

@@ -0,0 +1,38 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
swahili_gemma_ascending_chart.png filter=lfs diff=lfs merge=lfs -text
swahili_comprehensive_chart.png filter=lfs diff=lfs merge=lfs -text

77
EVALUATION.md Normal file
View File

@@ -0,0 +1,77 @@
# Comprehensive FLORES Translation Evaluation Results
## Overview
This package contains comprehensive evaluation results for English→Luganda and English→Swahili translation using the FLORES+ dataset. The evaluation includes specialized fine-tuned models, commercial services, and baseline models.
## Contents
### 📊 Charts (`/charts/`)
- `luganda_comprehensive_chart.png` - Complete Luganda translation performance comparison (17 models)
- `swahili_comprehensive_chart.png` - Complete Swahili translation performance comparison (16 models)
### 📈 Data (`/data/`)
- `luganda_results.csv` - Detailed Luganda evaluation results with rankings
- `swahili_results.csv` - Detailed Swahili evaluation results with rankings
- `summary.csv` - Executive summary of our models' performance
## Key Results
### 🏆 Our Models Performance
| Language | Model | Rank | BLEU | chrF++ | Percentile | Efficiency (BLEU/B) |
|----------|-------|------|------|--------|------------|---------------------|
| **Luganda** | Ganda Gemma 1B | 5/17 | 6.99 | 40.32 | 76.5% | 6.99 |
| **Swahili** | Swahili Gemma 1B | 12/16 | 27.59 | 56.84 | 31.2% | 27.59 |
### 🎯 Key Insights
**Language Resource Impact:**
- **Swahili** significantly outperforms **Luganda** (27.59 vs 6.99 BLEU)
- Reflects the resource availability gap between the two languages
- Demonstrates the challenge of low-resource language translation
**Competitive Standing:**
- **Luganda**: Ranks 5th out of 17 models (76.5th percentile)
- **Swahili**: Ranks 12th out of 16 models (31.2nd percentile)
- Both models show excellent parameter efficiency
**Baseline Comparison:**
- Our specialized models vastly outperform the general Gemma-3-1B baseline
- **Luganda**: 6.99 vs 0.51 BLEU (13.8x improvement)
- **Swahili**: 27.59 vs 2.78 BLEU (9.9x improvement)
## Methodology
**Dataset:** FLORES+ devtest split (1,012 sentence pairs per language)
**Metrics:** BLEU and chrF++ scores
**Evaluation:** Comprehensive comparison across 17 different models/services
**Baseline:** vLLM-served Gemma-3-1B-IT for fair comparison
## Models Evaluated
### Commercial Services
- Google Translate (top performer in both languages)
### Specialized Models (Ours)
- Ganda Gemma 1B (fine-tuned for Luganda)
- Swahili Gemma 1B (fine-tuned for Swahili)
### General Models
- Claude Sonnet 4, GPT variants, Gemini models, Llama models
- Gemma-3-1B baseline (vLLM)
## Files Description
### Data Files
- **CSV Structure**: Rank, Model, Type, Parameters (B), BLEU, chrF++, BLEU per Billion Params, Our Model
- **Rankings**: Sorted by BLEU score (descending)
- **Efficiency**: BLEU score per billion parameters for fair comparison
### Charts
- **Visual comparison** of all models with our models highlighted
- **Color coding**: Red (BLEU), Black (chrF++)
- **Special marking**: Diagonal stripes for our models
---
*Evaluation Framework: FLORES+ English→African Languages*

241
README.md Normal file
View File

@@ -0,0 +1,241 @@
---
base_model: google/gemma-3-1b-it
language:
- en
- sw
library_name: transformers
license: gemma
tags:
- swahili
- translation
- conversational
- gemma
- gemma3
- fine-tuned
pipeline_tag: text-generation
---
# Swahili Gemma 1B
A fine-tuned Gemma 3 1B instruction model specialized for **English-to-Swahili translation and Swahili conversational AI**. The model accepts input in both English and Swahili but outputs responses exclusively in Swahili.
## 📊 Translation Performance
![Translation Performance Comparison](swahili_gemma_ascending_chart.png)
### Model Comparison
| Model | Parameters | BLEU | chrF++ | Efficiency* |
|-------|------------|------|--------|-------------|
| Gemma 3 4B | 4B | 10.9 | 44.1 | 2.7 |
| **Swahili Gemma 1B** | **1B** | **27.6** | **56.8** | **27.6** |
| Gemma 3 27B | 27B | 29.4 | 60.0 | 1.1 |
| GPT-5 Mini | ~8B | 31.8 | 62.4 | 4.0 |
| Gemini 2.0 Flash | Large | 35.6 | 64.6 | N/A |
*Efficiency = BLEU Score / Parameters (in billions)
### Key Performance Insights
🎯 **Efficiency Leader**: Achieves the highest BLEU-to-parameter ratio (27.6 BLEU per billion parameters)
🚀 **Size Advantage**: Outperforms Gemma 3 4B (4x larger) by 153% on BLEU score
💎 **Competitive Quality**: Achieves 94% of Gemma 3 27B performance with 27x fewer parameters
**Practical Deployment**: Runs efficiently on consumer hardware while maintaining quality
### Evaluation Details
- **Dataset**: FLORES-200 English→Swahili (1,012 translation pairs)
- **Metrics**: BLEU (bilingual evaluation understudy) and chrF++ (character F-score)
- **Evaluation**: Zero-shot translation performance
## 🚀 Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("CraneAILabs/swahili-gemma-1b")
tokenizer = AutoTokenizer.from_pretrained("CraneAILabs/swahili-gemma-1b")
# Translate to Swahili
prompt = "Translate to Swahili: Hello, how are you today?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100, temperature=0.3)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## 🌍 Language Capabilities
- **Input Languages**: English + Swahili
- **Output Language**: Swahili only
- **Primary Focus**: English-to-Swahili translation and Swahili conversation
## 🎯 Capabilities
- **Translation**: English-to-Swahili translation
- **Conversational AI**: Natural dialogue in Swahili
- **Summarization**: Text summarization in Swahili
- **Writing**: Creative and informational writing in Swahili
- **Question Answering**: General knowledge responses in Swahili
## 💻 Usage Examples
### Basic Translation
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("CraneAILabs/swahili-gemma-1b")
tokenizer = AutoTokenizer.from_pretrained("CraneAILabs/swahili-gemma-1b")
# English to Swahili translation
prompt = "Translate to Swahili: Good morning, how did you sleep?"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_length=128,
temperature=0.3,
top_p=0.95,
top_k=64,
repetition_penalty=1.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Swahili Conversation
```python
# Direct Swahili conversation
prompt = "Hujambo! Je, unaweza kunisaidia leo?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100, temperature=0.3)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Using the Pipeline
```python
from transformers import pipeline
# Create a text generation pipeline
generator = pipeline(
"text-generation",
model="CraneAILabs/swahili-gemma-1b",
tokenizer="CraneAILabs/swahili-gemma-1b",
device=0 if torch.cuda.is_available() else -1
)
# Generate Swahili text
result = generator(
"Translate to Swahili: Welcome to our school",
max_length=100,
temperature=0.3,
do_sample=True
)
print(result[0]['generated_text'])
```
### Ollama Usage
```bash
# Run the recommended Q4_K_M quantization
ollama run crane-ai-labs/swahili-gemma-1b:q4-k-m
# Try different quantizations based on your needs
ollama run crane-ai-labs/swahili-gemma-1b:q8-0 # Higher quality
ollama run crane-ai-labs/swahili-gemma-1b:q4-k-s # Smaller size
ollama run crane-ai-labs/swahili-gemma-1b:f16 # Original quality
```
### Available Quantizations
| Quantization | Size | Quality | Use Case |
|-------------|------|---------|----------|
| `f16` | ~1.9GB | Highest | Maximum quality inference |
| `f32` | ~3.8GB | Highest | Research & benchmarking |
| `q8-0` | ~1.0GB | Very High | Production with ample resources |
| `q5-k-m` | ~812MB | High | Balanced quality/size |
| `q4-k-m` | ~769MB | Good | **Recommended** for most users |
| `q4-k-s` | ~745MB | Good | Resource-constrained environments |
| `q3-k-m` | ~689MB | Fair | Mobile/edge deployment |
| `q2-k` | ~658MB | Lower | Minimal resource usage |
## 💡 Generation Parameters
Recommended settings for optimal results:
```python
generation_config = {
"temperature": 0.3, # Focused, coherent responses
"top_p": 0.95, # Nucleus sampling
"top_k": 64, # Top-k sampling
"max_length": 128, # Response length limit
"repetition_penalty": 1.1, # Reduces repetition
"do_sample": True,
"pad_token_id": tokenizer.eos_token_id
}
```
## 🔗 Related Models
- **GGUF Quantizations**: [CraneAILabs/swahili-gemma-1b-GGUF](https://huggingface.co/CraneAILabs/swahili-gemma-1b-GGUF) - Optimized for llama.cpp/Ollama
- **LiteRT Mobile**: [CraneAILabs/swahili-gemma-1b-litert](https://huggingface.co/CraneAILabs/swahili-gemma-1b-litert) - Mobile deployment
- **Ollama**: [crane-ai-labs/swahili-gemma-1b](https://ollama.com/crane-ai-labs/swahili-gemma-1b) - Ready-to-run with Ollama
## 🎨 Use Cases
- **Language Learning**: Practice English-Swahili translation
- **Cultural Preservation**: Create and document Swahili content
- **Educational Tools**: Swahili learning assistants
- **Content Localization**: Translate materials to Swahili
- **Conversational Practice**: Improve Swahili dialogue skills
- **Text Summarization**: Summarize content in Swahili
## ⚠️ Limitations
- **Language Output**: Responds only in Swahili
- **Factual Knowledge**: General knowledge only, not trained on specific factual datasets
- **No Coding/Math**: Not designed for programming or mathematical tasks
- **Context Length**: Limited to 4,096 tokens for optimal performance
- **Specialized Domains**: May require domain-specific fine-tuning
## 📄 License
This model is released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Please review the terms before use.
## 🙏 Acknowledgments
- **Google**: For the Gemma 3 base model, support and guidance.
- **Community**: For Swahili language resources and datasets
- **Gilbert Korir (Msingi AI, Nairobi, Kenya)**
- **Alfred Malengo Kondoro (Hanyang University, Seoul, South Korea)**
## Citation
If you use this model in your research or applications, please cite:
```bibtex
@misc{crane_ai_labs_2025,
author = {Bakunga Bronson and Kato Steven Mubiru and Lwanga Caleb and Gimei Alex and Kavuma Lameck and Roland Ganafa and Sibomana Glorry and Atuhaire Collins and JohnRoy Nangeso and Tukamushaba Catherine},
title = {Swahili Gemma: A Fine-tuned Gemma 3 1B Model for Swahili conversational AI},
year = {2025},
url = {https://huggingface.co/CraneAILabs/swahili-gemma-1b},
organization = {Crane AI Labs}
}
```
---
**Built with ❤️ by Crane AI Labs**
*Swahili Gemma - Your helpful Swahili AI companion*

3
added_tokens.json Normal file
View File

@@ -0,0 +1,3 @@
{
"<image_soft_token>": 262144
}

47
chat_template.jinja Normal file
View File

@@ -0,0 +1,47 @@
{{ bos_token }}
{%- if messages[0]['role'] == 'system' -%}
{%- if messages[0]['content'] is string -%}
{%- set first_user_prefix = messages[0]['content'] + '
' -%}
{%- else -%}
{%- set first_user_prefix = messages[0]['content'][0]['text'] + '
' -%}
{%- endif -%}
{%- set loop_messages = messages[1:] -%}
{%- else -%}
{%- set first_user_prefix = "" -%}
{%- set loop_messages = messages -%}
{%- endif -%}
{%- for message in loop_messages -%}
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
{%- endif -%}
{%- if (message['role'] == 'assistant') -%}
{%- set role = "model" -%}
{%- else -%}
{%- set role = message['role'] -%}
{%- endif -%}
{{ '<start_of_turn>' + role + '
' + (first_user_prefix if loop.first else "") }}
{%- if message['content'] is string -%}
{{ message['content'] | trim }}
{%- elif message['content'] is iterable -%}
{%- for item in message['content'] -%}
{%- if item['type'] == 'image' -%}
{{ '<start_of_image>' }}
{%- elif item['type'] == 'text' -%}
{{ item['text'] | trim }}
{%- endif -%}
{%- endfor -%}
{%- else -%}
{{ raise_exception("Invalid content type") }}
{%- endif -%}
{{ '<end_of_turn>
' }}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{ '<start_of_turn>model
' }}
{%- endif -%}

64
config.json Normal file
View File

@@ -0,0 +1,64 @@
{
"architectures": [
"Gemma3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"attn_logit_softcapping": null,
"bos_token_id": 2,
"cache_implementation": "hybrid",
"eos_token_id": 106,
"final_logit_softcapping": null,
"head_dim": 256,
"hidden_activation": "gelu_pytorch_tanh",
"hidden_size": 1152,
"initializer_range": 0.02,
"intermediate_size": 6912,
"layer_types": [
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"sliding_attention"
],
"max_position_embeddings": 32768,
"model_type": "gemma3_text",
"num_attention_heads": 4,
"num_hidden_layers": 26,
"num_key_value_heads": 1,
"pad_token_id": 0,
"query_pre_attn_scalar": 256,
"rms_norm_eps": 1e-06,
"rope_local_base_freq": 10000,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": 512,
"sliding_window_pattern": 6,
"torch_dtype": "bfloat16",
"transformers_version": "4.53.0",
"unsloth_fixed": true,
"unsloth_version": "2025.6.7",
"use_cache": true,
"vocab_size": 262144
}

14
generation_config.json Normal file
View File

@@ -0,0 +1,14 @@
{
"bos_token_id": 2,
"cache_implementation": "hybrid",
"do_sample": true,
"eos_token_id": [
1,
106
],
"max_length": 32768,
"pad_token_id": 0,
"top_k": 64,
"top_p": 0.95,
"transformers_version": "4.53.0"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:166758c7092a132fd651fd10c5fd25a0c0f7c0f9e58ec9c9b96e132b30ed5c0f
size 1999811208

33
special_tokens_map.json Normal file
View File

@@ -0,0 +1,33 @@
{
"boi_token": "<start_of_image>",
"bos_token": {
"content": "<bos>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eoi_token": "<end_of_image>",
"eos_token": {
"content": "<end_of_turn>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"image_token": "<image_soft_token>",
"pad_token": {
"content": "<pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c01cebb779ea777195000cd758063ddf62015acc011c2b0eec545f08396967a9
size 434267

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3a1fc1c47716da27e29c4ae052acc3032a204ac9d93a8b0d4c33b6460c255d05
size 151471

17
swahili_results.csv Normal file
View File

@@ -0,0 +1,17 @@
Rank,Model,Type,BLEU,chrF++,Our Model
1,Google Translate,Commercial Service,39.94,66.71,FALSE
2,Gemini 2.0 Flash 001,Google,35.57,64.62,FALSE
3,Claude Sonnet 4,Anthropic,34.93,64.27,FALSE
4,Chatgpt 4o Latest,OpenAI,34.57,64.18,FALSE
5,Gemini 2.5 Flash,Google,34.16,63.51,FALSE
6,Gpt 5 Mini,OpenAI,31.75,62.41,FALSE
7,Llama 4 Scout,Meta,31.46,61.77,FALSE
8,Llama 4 Maverick,Meta,31.35,61.71,FALSE
9,Gpt Oss 120B,OpenAI,30.64,60.47,FALSE
10,Gpt 5 Nano,OpenAI,30.19,61.35,FALSE
11,Gemma 3 27B,Google,29.38,60.02,FALSE
12,Swahili Gemma 1B (Our Model),Specialized Fine-tuned,27.59,56.84,TRUE
13,Gpt Oss 20B,OpenAI,20.13,50.9,FALSE
14,Gemma 3 4B,Google,10.91,44.1,FALSE
15,Gemma 3N E4B,Google,6.27,42.47,FALSE
16,Gemma 3 1B (vLLM Baseline),General Model,2.78,24.79,FALSE
1 Rank Model Type BLEU chrF++ Our Model
2 1 Google Translate Commercial Service 39.94 66.71 FALSE
3 2 Gemini 2.0 Flash 001 Google 35.57 64.62 FALSE
4 3 Claude Sonnet 4 Anthropic 34.93 64.27 FALSE
5 4 Chatgpt 4o Latest OpenAI 34.57 64.18 FALSE
6 5 Gemini 2.5 Flash Google 34.16 63.51 FALSE
7 6 Gpt 5 Mini OpenAI 31.75 62.41 FALSE
8 7 Llama 4 Scout Meta 31.46 61.77 FALSE
9 8 Llama 4 Maverick Meta 31.35 61.71 FALSE
10 9 Gpt Oss 120B OpenAI 30.64 60.47 FALSE
11 10 Gpt 5 Nano OpenAI 30.19 61.35 FALSE
12 11 Gemma 3 27B Google 29.38 60.02 FALSE
13 12 Swahili Gemma 1B (Our Model) Specialized Fine-tuned 27.59 56.84 TRUE
14 13 Gpt Oss 20B OpenAI 20.13 50.9 FALSE
15 14 Gemma 3 4B Google 10.91 44.1 FALSE
16 15 Gemma 3N E4B Google 6.27 42.47 FALSE
17 16 Gemma 3 1B (vLLM Baseline) General Model 2.78 24.79 FALSE

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
size 33384568

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
size 4689074

51346
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff