初始化项目,由ModelHub XC社区提供模型
Model: CraneAILabs/swahili-gemma-1b Source: Original Platform
This commit is contained in:
38
.gitattributes
vendored
Normal file
38
.gitattributes
vendored
Normal file
@@ -0,0 +1,38 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
swahili_gemma_ascending_chart.png filter=lfs diff=lfs merge=lfs -text
|
||||
swahili_comprehensive_chart.png filter=lfs diff=lfs merge=lfs -text
|
||||
77
EVALUATION.md
Normal file
77
EVALUATION.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# Comprehensive FLORES Translation Evaluation Results
|
||||
|
||||
## Overview
|
||||
This package contains comprehensive evaluation results for English→Luganda and English→Swahili translation using the FLORES+ dataset. The evaluation includes specialized fine-tuned models, commercial services, and baseline models.
|
||||
|
||||
## Contents
|
||||
|
||||
### 📊 Charts (`/charts/`)
|
||||
- `luganda_comprehensive_chart.png` - Complete Luganda translation performance comparison (17 models)
|
||||
- `swahili_comprehensive_chart.png` - Complete Swahili translation performance comparison (16 models)
|
||||
|
||||
### 📈 Data (`/data/`)
|
||||
- `luganda_results.csv` - Detailed Luganda evaluation results with rankings
|
||||
- `swahili_results.csv` - Detailed Swahili evaluation results with rankings
|
||||
- `summary.csv` - Executive summary of our models' performance
|
||||
|
||||
## Key Results
|
||||
|
||||
### 🏆 Our Models Performance
|
||||
|
||||
| Language | Model | Rank | BLEU | chrF++ | Percentile | Efficiency (BLEU/B) |
|
||||
|----------|-------|------|------|--------|------------|---------------------|
|
||||
| **Luganda** | Ganda Gemma 1B | 5/17 | 6.99 | 40.32 | 76.5% | 6.99 |
|
||||
| **Swahili** | Swahili Gemma 1B | 12/16 | 27.59 | 56.84 | 31.2% | 27.59 |
|
||||
|
||||
### 🎯 Key Insights
|
||||
|
||||
**Language Resource Impact:**
|
||||
- **Swahili** significantly outperforms **Luganda** (27.59 vs 6.99 BLEU)
|
||||
- Reflects the resource availability gap between the two languages
|
||||
- Demonstrates the challenge of low-resource language translation
|
||||
|
||||
**Competitive Standing:**
|
||||
- **Luganda**: Ranks 5th out of 17 models (76.5th percentile)
|
||||
- **Swahili**: Ranks 12th out of 16 models (31.2nd percentile)
|
||||
- Both models show excellent parameter efficiency
|
||||
|
||||
**Baseline Comparison:**
|
||||
- Our specialized models vastly outperform the general Gemma-3-1B baseline
|
||||
- **Luganda**: 6.99 vs 0.51 BLEU (13.8x improvement)
|
||||
- **Swahili**: 27.59 vs 2.78 BLEU (9.9x improvement)
|
||||
|
||||
## Methodology
|
||||
|
||||
**Dataset:** FLORES+ devtest split (1,012 sentence pairs per language)
|
||||
**Metrics:** BLEU and chrF++ scores
|
||||
**Evaluation:** Comprehensive comparison across 17 different models/services
|
||||
**Baseline:** vLLM-served Gemma-3-1B-IT for fair comparison
|
||||
|
||||
## Models Evaluated
|
||||
|
||||
### Commercial Services
|
||||
- Google Translate (top performer in both languages)
|
||||
|
||||
### Specialized Models (Ours)
|
||||
- Ganda Gemma 1B (fine-tuned for Luganda)
|
||||
- Swahili Gemma 1B (fine-tuned for Swahili)
|
||||
|
||||
### General Models
|
||||
- Claude Sonnet 4, GPT variants, Gemini models, Llama models
|
||||
- Gemma-3-1B baseline (vLLM)
|
||||
|
||||
## Files Description
|
||||
|
||||
### Data Files
|
||||
- **CSV Structure**: Rank, Model, Type, Parameters (B), BLEU, chrF++, BLEU per Billion Params, Our Model
|
||||
- **Rankings**: Sorted by BLEU score (descending)
|
||||
- **Efficiency**: BLEU score per billion parameters for fair comparison
|
||||
|
||||
### Charts
|
||||
- **Visual comparison** of all models with our models highlighted
|
||||
- **Color coding**: Red (BLEU), Black (chrF++)
|
||||
- **Special marking**: Diagonal stripes for our models
|
||||
|
||||
---
|
||||
|
||||
*Evaluation Framework: FLORES+ English→African Languages*
|
||||
241
README.md
Normal file
241
README.md
Normal file
@@ -0,0 +1,241 @@
|
||||
---
|
||||
base_model: google/gemma-3-1b-it
|
||||
language:
|
||||
- en
|
||||
- sw
|
||||
library_name: transformers
|
||||
license: gemma
|
||||
tags:
|
||||
- swahili
|
||||
- translation
|
||||
- conversational
|
||||
- gemma
|
||||
- gemma3
|
||||
- fine-tuned
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# Swahili Gemma 1B
|
||||
|
||||
A fine-tuned Gemma 3 1B instruction model specialized for **English-to-Swahili translation and Swahili conversational AI**. The model accepts input in both English and Swahili but outputs responses exclusively in Swahili.
|
||||
|
||||
## 📊 Translation Performance
|
||||
|
||||

|
||||
|
||||
### Model Comparison
|
||||
|
||||
| Model | Parameters | BLEU | chrF++ | Efficiency* |
|
||||
|-------|------------|------|--------|-------------|
|
||||
| Gemma 3 4B | 4B | 10.9 | 44.1 | 2.7 |
|
||||
| **Swahili Gemma 1B** | **1B** | **27.6** | **56.8** | **27.6** |
|
||||
| Gemma 3 27B | 27B | 29.4 | 60.0 | 1.1 |
|
||||
| GPT-5 Mini | ~8B | 31.8 | 62.4 | 4.0 |
|
||||
| Gemini 2.0 Flash | Large | 35.6 | 64.6 | N/A |
|
||||
|
||||
*Efficiency = BLEU Score / Parameters (in billions)
|
||||
|
||||
### Key Performance Insights
|
||||
|
||||
🎯 **Efficiency Leader**: Achieves the highest BLEU-to-parameter ratio (27.6 BLEU per billion parameters)
|
||||
🚀 **Size Advantage**: Outperforms Gemma 3 4B (4x larger) by 153% on BLEU score
|
||||
💎 **Competitive Quality**: Achieves 94% of Gemma 3 27B performance with 27x fewer parameters
|
||||
⚡ **Practical Deployment**: Runs efficiently on consumer hardware while maintaining quality
|
||||
|
||||
### Evaluation Details
|
||||
|
||||
- **Dataset**: FLORES-200 English→Swahili (1,012 translation pairs)
|
||||
- **Metrics**: BLEU (bilingual evaluation understudy) and chrF++ (character F-score)
|
||||
- **Evaluation**: Zero-shot translation performance
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
# Load model and tokenizer
|
||||
model = AutoModelForCausalLM.from_pretrained("CraneAILabs/swahili-gemma-1b")
|
||||
tokenizer = AutoTokenizer.from_pretrained("CraneAILabs/swahili-gemma-1b")
|
||||
|
||||
# Translate to Swahili
|
||||
prompt = "Translate to Swahili: Hello, how are you today?"
|
||||
inputs = tokenizer(prompt, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_length=100, temperature=0.3)
|
||||
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||
print(response)
|
||||
```
|
||||
|
||||
## 🌍 Language Capabilities
|
||||
|
||||
- **Input Languages**: English + Swahili
|
||||
- **Output Language**: Swahili only
|
||||
- **Primary Focus**: English-to-Swahili translation and Swahili conversation
|
||||
|
||||
## 🎯 Capabilities
|
||||
|
||||
- **Translation**: English-to-Swahili translation
|
||||
- **Conversational AI**: Natural dialogue in Swahili
|
||||
- **Summarization**: Text summarization in Swahili
|
||||
- **Writing**: Creative and informational writing in Swahili
|
||||
- **Question Answering**: General knowledge responses in Swahili
|
||||
|
||||
## 💻 Usage Examples
|
||||
|
||||
### Basic Translation
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("CraneAILabs/swahili-gemma-1b")
|
||||
tokenizer = AutoTokenizer.from_pretrained("CraneAILabs/swahili-gemma-1b")
|
||||
|
||||
# English to Swahili translation
|
||||
prompt = "Translate to Swahili: Good morning, how did you sleep?"
|
||||
inputs = tokenizer(prompt, return_tensors="pt")
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model.generate(
|
||||
inputs.input_ids,
|
||||
max_length=128,
|
||||
temperature=0.3,
|
||||
top_p=0.95,
|
||||
top_k=64,
|
||||
repetition_penalty=1.1,
|
||||
do_sample=True,
|
||||
pad_token_id=tokenizer.eos_token_id
|
||||
)
|
||||
|
||||
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||
print(response)
|
||||
```
|
||||
|
||||
### Swahili Conversation
|
||||
|
||||
```python
|
||||
# Direct Swahili conversation
|
||||
prompt = "Hujambo! Je, unaweza kunisaidia leo?"
|
||||
inputs = tokenizer(prompt, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_length=100, temperature=0.3)
|
||||
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||
print(response)
|
||||
```
|
||||
|
||||
### Using the Pipeline
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
# Create a text generation pipeline
|
||||
generator = pipeline(
|
||||
"text-generation",
|
||||
model="CraneAILabs/swahili-gemma-1b",
|
||||
tokenizer="CraneAILabs/swahili-gemma-1b",
|
||||
device=0 if torch.cuda.is_available() else -1
|
||||
)
|
||||
|
||||
# Generate Swahili text
|
||||
result = generator(
|
||||
"Translate to Swahili: Welcome to our school",
|
||||
max_length=100,
|
||||
temperature=0.3,
|
||||
do_sample=True
|
||||
)
|
||||
print(result[0]['generated_text'])
|
||||
```
|
||||
|
||||
### Ollama Usage
|
||||
|
||||
```bash
|
||||
# Run the recommended Q4_K_M quantization
|
||||
ollama run crane-ai-labs/swahili-gemma-1b:q4-k-m
|
||||
|
||||
# Try different quantizations based on your needs
|
||||
ollama run crane-ai-labs/swahili-gemma-1b:q8-0 # Higher quality
|
||||
ollama run crane-ai-labs/swahili-gemma-1b:q4-k-s # Smaller size
|
||||
ollama run crane-ai-labs/swahili-gemma-1b:f16 # Original quality
|
||||
```
|
||||
|
||||
### Available Quantizations
|
||||
|
||||
| Quantization | Size | Quality | Use Case |
|
||||
|-------------|------|---------|----------|
|
||||
| `f16` | ~1.9GB | Highest | Maximum quality inference |
|
||||
| `f32` | ~3.8GB | Highest | Research & benchmarking |
|
||||
| `q8-0` | ~1.0GB | Very High | Production with ample resources |
|
||||
| `q5-k-m` | ~812MB | High | Balanced quality/size |
|
||||
| `q4-k-m` | ~769MB | Good | **Recommended** for most users |
|
||||
| `q4-k-s` | ~745MB | Good | Resource-constrained environments |
|
||||
| `q3-k-m` | ~689MB | Fair | Mobile/edge deployment |
|
||||
| `q2-k` | ~658MB | Lower | Minimal resource usage |
|
||||
|
||||
## 💡 Generation Parameters
|
||||
|
||||
Recommended settings for optimal results:
|
||||
|
||||
```python
|
||||
generation_config = {
|
||||
"temperature": 0.3, # Focused, coherent responses
|
||||
"top_p": 0.95, # Nucleus sampling
|
||||
"top_k": 64, # Top-k sampling
|
||||
"max_length": 128, # Response length limit
|
||||
"repetition_penalty": 1.1, # Reduces repetition
|
||||
"do_sample": True,
|
||||
"pad_token_id": tokenizer.eos_token_id
|
||||
}
|
||||
```
|
||||
|
||||
## 🔗 Related Models
|
||||
|
||||
- **GGUF Quantizations**: [CraneAILabs/swahili-gemma-1b-GGUF](https://huggingface.co/CraneAILabs/swahili-gemma-1b-GGUF) - Optimized for llama.cpp/Ollama
|
||||
- **LiteRT Mobile**: [CraneAILabs/swahili-gemma-1b-litert](https://huggingface.co/CraneAILabs/swahili-gemma-1b-litert) - Mobile deployment
|
||||
- **Ollama**: [crane-ai-labs/swahili-gemma-1b](https://ollama.com/crane-ai-labs/swahili-gemma-1b) - Ready-to-run with Ollama
|
||||
|
||||
## 🎨 Use Cases
|
||||
|
||||
- **Language Learning**: Practice English-Swahili translation
|
||||
- **Cultural Preservation**: Create and document Swahili content
|
||||
- **Educational Tools**: Swahili learning assistants
|
||||
- **Content Localization**: Translate materials to Swahili
|
||||
- **Conversational Practice**: Improve Swahili dialogue skills
|
||||
- **Text Summarization**: Summarize content in Swahili
|
||||
|
||||
## ⚠️ Limitations
|
||||
|
||||
- **Language Output**: Responds only in Swahili
|
||||
- **Factual Knowledge**: General knowledge only, not trained on specific factual datasets
|
||||
- **No Coding/Math**: Not designed for programming or mathematical tasks
|
||||
- **Context Length**: Limited to 4,096 tokens for optimal performance
|
||||
- **Specialized Domains**: May require domain-specific fine-tuning
|
||||
|
||||
## 📄 License
|
||||
|
||||
This model is released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Please review the terms before use.
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
- **Google**: For the Gemma 3 base model, support and guidance.
|
||||
- **Community**: For Swahili language resources and datasets
|
||||
- **Gilbert Korir (Msingi AI, Nairobi, Kenya)**
|
||||
- **Alfred Malengo Kondoro (Hanyang University, Seoul, South Korea)**
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this model in your research or applications, please cite:
|
||||
|
||||
```bibtex
|
||||
@misc{crane_ai_labs_2025,
|
||||
author = {Bakunga Bronson and Kato Steven Mubiru and Lwanga Caleb and Gimei Alex and Kavuma Lameck and Roland Ganafa and Sibomana Glorry and Atuhaire Collins and JohnRoy Nangeso and Tukamushaba Catherine},
|
||||
title = {Swahili Gemma: A Fine-tuned Gemma 3 1B Model for Swahili conversational AI},
|
||||
year = {2025},
|
||||
url = {https://huggingface.co/CraneAILabs/swahili-gemma-1b},
|
||||
organization = {Crane AI Labs}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
---
|
||||
|
||||
**Built with ❤️ by Crane AI Labs**
|
||||
|
||||
*Swahili Gemma - Your helpful Swahili AI companion*
|
||||
3
added_tokens.json
Normal file
3
added_tokens.json
Normal file
@@ -0,0 +1,3 @@
|
||||
{
|
||||
"<image_soft_token>": 262144
|
||||
}
|
||||
47
chat_template.jinja
Normal file
47
chat_template.jinja
Normal file
@@ -0,0 +1,47 @@
|
||||
{{ bos_token }}
|
||||
{%- if messages[0]['role'] == 'system' -%}
|
||||
{%- if messages[0]['content'] is string -%}
|
||||
{%- set first_user_prefix = messages[0]['content'] + '
|
||||
|
||||
' -%}
|
||||
{%- else -%}
|
||||
{%- set first_user_prefix = messages[0]['content'][0]['text'] + '
|
||||
|
||||
' -%}
|
||||
{%- endif -%}
|
||||
{%- set loop_messages = messages[1:] -%}
|
||||
{%- else -%}
|
||||
{%- set first_user_prefix = "" -%}
|
||||
{%- set loop_messages = messages -%}
|
||||
{%- endif -%}
|
||||
{%- for message in loop_messages -%}
|
||||
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
|
||||
{{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
|
||||
{%- endif -%}
|
||||
{%- if (message['role'] == 'assistant') -%}
|
||||
{%- set role = "model" -%}
|
||||
{%- else -%}
|
||||
{%- set role = message['role'] -%}
|
||||
{%- endif -%}
|
||||
{{ '<start_of_turn>' + role + '
|
||||
' + (first_user_prefix if loop.first else "") }}
|
||||
{%- if message['content'] is string -%}
|
||||
{{ message['content'] | trim }}
|
||||
{%- elif message['content'] is iterable -%}
|
||||
{%- for item in message['content'] -%}
|
||||
{%- if item['type'] == 'image' -%}
|
||||
{{ '<start_of_image>' }}
|
||||
{%- elif item['type'] == 'text' -%}
|
||||
{{ item['text'] | trim }}
|
||||
{%- endif -%}
|
||||
{%- endfor -%}
|
||||
{%- else -%}
|
||||
{{ raise_exception("Invalid content type") }}
|
||||
{%- endif -%}
|
||||
{{ '<end_of_turn>
|
||||
' }}
|
||||
{%- endfor -%}
|
||||
{%- if add_generation_prompt -%}
|
||||
{{ '<start_of_turn>model
|
||||
' }}
|
||||
{%- endif -%}
|
||||
64
config.json
Normal file
64
config.json
Normal file
@@ -0,0 +1,64 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Gemma3ForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"attn_logit_softcapping": null,
|
||||
"bos_token_id": 2,
|
||||
"cache_implementation": "hybrid",
|
||||
"eos_token_id": 106,
|
||||
"final_logit_softcapping": null,
|
||||
"head_dim": 256,
|
||||
"hidden_activation": "gelu_pytorch_tanh",
|
||||
"hidden_size": 1152,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 6912,
|
||||
"layer_types": [
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention",
|
||||
"full_attention",
|
||||
"sliding_attention",
|
||||
"sliding_attention"
|
||||
],
|
||||
"max_position_embeddings": 32768,
|
||||
"model_type": "gemma3_text",
|
||||
"num_attention_heads": 4,
|
||||
"num_hidden_layers": 26,
|
||||
"num_key_value_heads": 1,
|
||||
"pad_token_id": 0,
|
||||
"query_pre_attn_scalar": 256,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_local_base_freq": 10000,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000,
|
||||
"sliding_window": 512,
|
||||
"sliding_window_pattern": 6,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.53.0",
|
||||
"unsloth_fixed": true,
|
||||
"unsloth_version": "2025.6.7",
|
||||
"use_cache": true,
|
||||
"vocab_size": 262144
|
||||
}
|
||||
14
generation_config.json
Normal file
14
generation_config.json
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"bos_token_id": 2,
|
||||
"cache_implementation": "hybrid",
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
1,
|
||||
106
|
||||
],
|
||||
"max_length": 32768,
|
||||
"pad_token_id": 0,
|
||||
"top_k": 64,
|
||||
"top_p": 0.95,
|
||||
"transformers_version": "4.53.0"
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:166758c7092a132fd651fd10c5fd25a0c0f7c0f9e58ec9c9b96e132b30ed5c0f
|
||||
size 1999811208
|
||||
33
special_tokens_map.json
Normal file
33
special_tokens_map.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"boi_token": "<start_of_image>",
|
||||
"bos_token": {
|
||||
"content": "<bos>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eoi_token": "<end_of_image>",
|
||||
"eos_token": {
|
||||
"content": "<end_of_turn>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"image_token": "<image_soft_token>",
|
||||
"pad_token": {
|
||||
"content": "<pad>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
swahili_comprehensive_chart.png
Normal file
3
swahili_comprehensive_chart.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c01cebb779ea777195000cd758063ddf62015acc011c2b0eec545f08396967a9
|
||||
size 434267
|
||||
3
swahili_gemma_ascending_chart.png
Normal file
3
swahili_gemma_ascending_chart.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3a1fc1c47716da27e29c4ae052acc3032a204ac9d93a8b0d4c33b6460c255d05
|
||||
size 151471
|
||||
17
swahili_results.csv
Normal file
17
swahili_results.csv
Normal file
@@ -0,0 +1,17 @@
|
||||
Rank,Model,Type,BLEU,chrF++,Our Model
|
||||
1,Google Translate,Commercial Service,39.94,66.71,FALSE
|
||||
2,Gemini 2.0 Flash 001,Google,35.57,64.62,FALSE
|
||||
3,Claude Sonnet 4,Anthropic,34.93,64.27,FALSE
|
||||
4,Chatgpt 4o Latest,OpenAI,34.57,64.18,FALSE
|
||||
5,Gemini 2.5 Flash,Google,34.16,63.51,FALSE
|
||||
6,Gpt 5 Mini,OpenAI,31.75,62.41,FALSE
|
||||
7,Llama 4 Scout,Meta,31.46,61.77,FALSE
|
||||
8,Llama 4 Maverick,Meta,31.35,61.71,FALSE
|
||||
9,Gpt Oss 120B,OpenAI,30.64,60.47,FALSE
|
||||
10,Gpt 5 Nano,OpenAI,30.19,61.35,FALSE
|
||||
11,Gemma 3 27B,Google,29.38,60.02,FALSE
|
||||
12,Swahili Gemma 1B (Our Model),Specialized Fine-tuned,27.59,56.84,TRUE
|
||||
13,Gpt Oss 20B,OpenAI,20.13,50.9,FALSE
|
||||
14,Gemma 3 4B,Google,10.91,44.1,FALSE
|
||||
15,Gemma 3N E4B,Google,6.27,42.47,FALSE
|
||||
16,Gemma 3 1B (vLLM Baseline),General Model,2.78,24.79,FALSE
|
||||
|
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
|
||||
size 33384568
|
||||
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
|
||||
size 4689074
|
||||
51346
tokenizer_config.json
Normal file
51346
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user