初始化项目，由ModelHub XC社区提供模型

Model: CraneAILabs/swahili-gemma-1b Source: Original Platform
2026-05-17 22:37:29 +08:00
commit 354ce0f3a5
15 changed files with 51895 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,38 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+swahili_gemma_ascending_chart.png filter=lfs diff=lfs merge=lfs -text
+swahili_comprehensive_chart.png filter=lfs diff=lfs merge=lfs -text
--- a/EVALUATION.md
+++ b/EVALUATION.md
@@ -0,0 +1,77 @@
+# Comprehensive FLORES Translation Evaluation Results
+
+## Overview
+This package contains comprehensive evaluation results for English→Luganda and English→Swahili translation using the FLORES+ dataset. The evaluation includes specialized fine-tuned models, commercial services, and baseline models.
+
+## Contents
+
+### 📊 Charts (`/charts/`)
+- `luganda_comprehensive_chart.png` - Complete Luganda translation performance comparison (17 models)
+- `swahili_comprehensive_chart.png` - Complete Swahili translation performance comparison (16 models)
+
+### 📈 Data (`/data/`)
+- `luganda_results.csv` - Detailed Luganda evaluation results with rankings
+- `swahili_results.csv` - Detailed Swahili evaluation results with rankings
+- `summary.csv` - Executive summary of our models' performance
+
+## Key Results
+
+### 🏆 Our Models Performance
+
+| Language | Model | Rank | BLEU | chrF++ | Percentile | Efficiency (BLEU/B) |
+|----------|-------|------|------|--------|------------|---------------------|
+| **Luganda** | Ganda Gemma 1B | 5/17 | 6.99 | 40.32 | 76.5% | 6.99 |
+| **Swahili** | Swahili Gemma 1B | 12/16 | 27.59 | 56.84 | 31.2% | 27.59 |
+
+### 🎯 Key Insights
+
+**Language Resource Impact:**
+- **Swahili** significantly outperforms **Luganda** (27.59 vs 6.99 BLEU)
+- Reflects the resource availability gap between the two languages
+- Demonstrates the challenge of low-resource language translation
+
+**Competitive Standing:**
+- **Luganda**: Ranks 5th out of 17 models (76.5th percentile)
+- **Swahili**: Ranks 12th out of 16 models (31.2nd percentile)
+- Both models show excellent parameter efficiency
+
+**Baseline Comparison:**
+- Our specialized models vastly outperform the general Gemma-3-1B baseline
+- **Luganda**: 6.99 vs 0.51 BLEU (13.8x improvement)
+- **Swahili**: 27.59 vs 2.78 BLEU (9.9x improvement)
+
+## Methodology
+
+**Dataset:** FLORES+ devtest split (1,012 sentence pairs per language)
+**Metrics:** BLEU and chrF++ scores
+**Evaluation:** Comprehensive comparison across 17 different models/services
+**Baseline:** vLLM-served Gemma-3-1B-IT for fair comparison
+
+## Models Evaluated
+
+### Commercial Services
+- Google Translate (top performer in both languages)
+
+### Specialized Models (Ours)
+- Ganda Gemma 1B (fine-tuned for Luganda)
+- Swahili Gemma 1B (fine-tuned for Swahili)
+
+### General Models
+- Claude Sonnet 4, GPT variants, Gemini models, Llama models
+- Gemma-3-1B baseline (vLLM)
+
+## Files Description
+
+### Data Files
+- **CSV Structure**: Rank, Model, Type, Parameters (B), BLEU, chrF++, BLEU per Billion Params, Our Model
+- **Rankings**: Sorted by BLEU score (descending)
+- **Efficiency**: BLEU score per billion parameters for fair comparison
+
+### Charts
+- **Visual comparison** of all models with our models highlighted
+- **Color coding**: Red (BLEU), Black (chrF++)
+- **Special marking**: Diagonal stripes for our models
+
+---
+
+*Evaluation Framework: FLORES+ English→African Languages*
--- a/README.md
+++ b/README.md
@@ -0,0 +1,241 @@
+---
+base_model: google/gemma-3-1b-it
+language:
+- en
+- sw
+library_name: transformers
+license: gemma
+tags:
+- swahili
+- translation
+- conversational
+- gemma
+- gemma3
+- fine-tuned
+pipeline_tag: text-generation
+---
+
+# Swahili Gemma 1B
+
+A fine-tuned Gemma 3 1B instruction model specialized for **English-to-Swahili translation and Swahili conversational AI**. The model accepts input in both English and Swahili but outputs responses exclusively in Swahili.
+
+## 📊 Translation Performance
+
+![Translation Performance Comparison](swahili_gemma_ascending_chart.png)
+
+### Model Comparison
+
+| Model | Parameters | BLEU | chrF++ | Efficiency* |
+|-------|------------|------|--------|-------------|
+| Gemma 3 4B | 4B | 10.9 | 44.1 | 2.7 |
+| **Swahili Gemma 1B** | **1B** | **27.6** | **56.8** | **27.6** |
+| Gemma 3 27B | 27B | 29.4 | 60.0 | 1.1 |
+| GPT-5 Mini | ~8B | 31.8 | 62.4 | 4.0 |
+| Gemini 2.0 Flash | Large | 35.6 | 64.6 | N/A |
+
+*Efficiency = BLEU Score / Parameters (in billions)
+
+### Key Performance Insights
+
+🎯 **Efficiency Leader**: Achieves the highest BLEU-to-parameter ratio (27.6 BLEU per billion parameters)  
+🚀 **Size Advantage**: Outperforms Gemma 3 4B (4x larger) by 153% on BLEU score  
+💎 **Competitive Quality**: Achieves 94% of Gemma 3 27B performance with 27x fewer parameters  
+⚡ **Practical Deployment**: Runs efficiently on consumer hardware while maintaining quality  
+
+### Evaluation Details
+
+- **Dataset**: FLORES-200 English→Swahili (1,012 translation pairs)
+- **Metrics**: BLEU (bilingual evaluation understudy) and chrF++ (character F-score)
+- **Evaluation**: Zero-shot translation performance
+
+## 🚀 Quick Start
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+# Load model and tokenizer
+model = AutoModelForCausalLM.from_pretrained("CraneAILabs/swahili-gemma-1b")
+tokenizer = AutoTokenizer.from_pretrained("CraneAILabs/swahili-gemma-1b")
+
+# Translate to Swahili
+prompt = "Translate to Swahili: Hello, how are you today?"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=100, temperature=0.3)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+
+## 🌍 Language Capabilities
+
+- **Input Languages**: English + Swahili
+- **Output Language**: Swahili only
+- **Primary Focus**: English-to-Swahili translation and Swahili conversation
+
+## 🎯 Capabilities
+
+- **Translation**: English-to-Swahili translation
+- **Conversational AI**: Natural dialogue in Swahili
+- **Summarization**: Text summarization in Swahili
+- **Writing**: Creative and informational writing in Swahili
+- **Question Answering**: General knowledge responses in Swahili
+
+## 💻 Usage Examples
+
+### Basic Translation
+
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("CraneAILabs/swahili-gemma-1b")
+tokenizer = AutoTokenizer.from_pretrained("CraneAILabs/swahili-gemma-1b")
+
+# English to Swahili translation
+prompt = "Translate to Swahili: Good morning, how did you sleep?"
+inputs = tokenizer(prompt, return_tensors="pt")
+
+with torch.no_grad():
+    outputs = model.generate(
+        inputs.input_ids,
+        max_length=128,
+        temperature=0.3,
+        top_p=0.95,
+        top_k=64,
+        repetition_penalty=1.1,
+        do_sample=True,
+        pad_token_id=tokenizer.eos_token_id
+    )
+
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+
+### Swahili Conversation
+
+```python
+# Direct Swahili conversation
+prompt = "Hujambo! Je, unaweza kunisaidia leo?"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=100, temperature=0.3)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+
+### Using the Pipeline
+
+```python
+from transformers import pipeline
+
+# Create a text generation pipeline
+generator = pipeline(
+    "text-generation",
+    model="CraneAILabs/swahili-gemma-1b",
+    tokenizer="CraneAILabs/swahili-gemma-1b",
+    device=0 if torch.cuda.is_available() else -1
+)
+
+# Generate Swahili text
+result = generator(
+    "Translate to Swahili: Welcome to our school",
+    max_length=100,
+    temperature=0.3,
+    do_sample=True
+)
+print(result[0]['generated_text'])
+```
+
+### Ollama Usage
+
+```bash
+# Run the recommended Q4_K_M quantization
+ollama run crane-ai-labs/swahili-gemma-1b:q4-k-m
+
+# Try different quantizations based on your needs
+ollama run crane-ai-labs/swahili-gemma-1b:q8-0    # Higher quality
+ollama run crane-ai-labs/swahili-gemma-1b:q4-k-s  # Smaller size
+ollama run crane-ai-labs/swahili-gemma-1b:f16     # Original quality
+```
+
+### Available Quantizations
+
+| Quantization | Size | Quality | Use Case |
+|-------------|------|---------|----------|
+| `f16` | ~1.9GB | Highest | Maximum quality inference |
+| `f32` | ~3.8GB | Highest | Research & benchmarking |
+| `q8-0` | ~1.0GB | Very High | Production with ample resources |
+| `q5-k-m` | ~812MB | High | Balanced quality/size |
+| `q4-k-m` | ~769MB | Good | **Recommended** for most users |
+| `q4-k-s` | ~745MB | Good | Resource-constrained environments |
+| `q3-k-m` | ~689MB | Fair | Mobile/edge deployment |
+| `q2-k` | ~658MB | Lower | Minimal resource usage |
+
+## 💡 Generation Parameters
+
+Recommended settings for optimal results:
+
+```python
+generation_config = {
+    "temperature": 0.3,      # Focused, coherent responses
+    "top_p": 0.95,          # Nucleus sampling
+    "top_k": 64,            # Top-k sampling
+    "max_length": 128,      # Response length limit
+    "repetition_penalty": 1.1,  # Reduces repetition
+    "do_sample": True,
+    "pad_token_id": tokenizer.eos_token_id
+}
+```
+
+## 🔗 Related Models
+
+- **GGUF Quantizations**: [CraneAILabs/swahili-gemma-1b-GGUF](https://huggingface.co/CraneAILabs/swahili-gemma-1b-GGUF) - Optimized for llama.cpp/Ollama
+- **LiteRT Mobile**: [CraneAILabs/swahili-gemma-1b-litert](https://huggingface.co/CraneAILabs/swahili-gemma-1b-litert) - Mobile deployment
+- **Ollama**: [crane-ai-labs/swahili-gemma-1b](https://ollama.com/crane-ai-labs/swahili-gemma-1b) - Ready-to-run with Ollama
+
+## 🎨 Use Cases
+
+- **Language Learning**: Practice English-Swahili translation
+- **Cultural Preservation**: Create and document Swahili content
+- **Educational Tools**: Swahili learning assistants
+- **Content Localization**: Translate materials to Swahili
+- **Conversational Practice**: Improve Swahili dialogue skills
+- **Text Summarization**: Summarize content in Swahili
+
+## ⚠️ Limitations
+
+- **Language Output**: Responds only in Swahili
+- **Factual Knowledge**: General knowledge only, not trained on specific factual datasets
+- **No Coding/Math**: Not designed for programming or mathematical tasks
+- **Context Length**: Limited to 4,096 tokens for optimal performance
+- **Specialized Domains**: May require domain-specific fine-tuning
+
+## 📄 License
+
+This model is released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Please review the terms before use.
+
+## 🙏 Acknowledgments
+
+- **Google**: For the Gemma 3 base model, support and guidance.
+- **Community**: For Swahili language resources and datasets
+- **Gilbert Korir (Msingi AI, Nairobi, Kenya)**
+- **Alfred Malengo Kondoro (Hanyang University, Seoul, South Korea)**
+
+## Citation
+
+If you use this model in your research or applications, please cite:
+
+```bibtex
+@misc{crane_ai_labs_2025,
+    author    = {Bakunga Bronson and Kato Steven Mubiru and Lwanga Caleb and Gimei Alex and Kavuma Lameck and Roland Ganafa and Sibomana Glorry and Atuhaire Collins and JohnRoy Nangeso and Tukamushaba Catherine},
+    title     = {Swahili Gemma: A Fine-tuned Gemma 3 1B Model for Swahili conversational AI},
+    year      = {2025},
+    url       = {https://huggingface.co/CraneAILabs/swahili-gemma-1b},
+    organization = {Crane AI Labs}
+}
+```
+
+
+---
+
+**Built with ❤️ by Crane AI Labs**
+
+*Swahili Gemma - Your helpful Swahili AI companion*
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,3 @@
+{
+  "<image_soft_token>": 262144
+}
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,47 @@
+{{ bos_token }}
+{%- if messages[0]['role'] == 'system' -%}
+    {%- if messages[0]['content'] is string -%}
+        {%- set first_user_prefix = messages[0]['content'] + '
+
+' -%}
+    {%- else -%}
+        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
+
+' -%}
+    {%- endif -%}
+    {%- set loop_messages = messages[1:] -%}
+{%- else -%}
+    {%- set first_user_prefix = "" -%}
+    {%- set loop_messages = messages -%}
+{%- endif -%}
+{%- for message in loop_messages -%}
+    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
+        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
+    {%- endif -%}
+    {%- if (message['role'] == 'assistant') -%}
+        {%- set role = "model" -%}
+    {%- else -%}
+        {%- set role = message['role'] -%}
+    {%- endif -%}
+    {{ '<start_of_turn>' + role + '
+' + (first_user_prefix if loop.first else "") }}
+    {%- if message['content'] is string -%}
+        {{ message['content'] | trim }}
+    {%- elif message['content'] is iterable -%}
+        {%- for item in message['content'] -%}
+            {%- if item['type'] == 'image' -%}
+                {{ '<start_of_image>' }}
+            {%- elif item['type'] == 'text' -%}
+                {{ item['text'] | trim }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{ raise_exception("Invalid content type") }}
+    {%- endif -%}
+    {{ '<end_of_turn>
+' }}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{ '<start_of_turn>model
+' }}
+{%- endif -%}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,64 @@
+{
+  "architectures": [
+    "Gemma3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "attn_logit_softcapping": null,
+  "bos_token_id": 2,
+  "cache_implementation": "hybrid",
+  "eos_token_id": 106,
+  "final_logit_softcapping": null,
+  "head_dim": 256,
+  "hidden_activation": "gelu_pytorch_tanh",
+  "hidden_size": 1152,
+  "initializer_range": 0.02,
+  "intermediate_size": 6912,
+  "layer_types": [
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "model_type": "gemma3_text",
+  "num_attention_heads": 4,
+  "num_hidden_layers": 26,
+  "num_key_value_heads": 1,
+  "pad_token_id": 0,
+  "query_pre_attn_scalar": 256,
+  "rms_norm_eps": 1e-06,
+  "rope_local_base_freq": 10000,
+  "rope_scaling": null,
+  "rope_theta": 1000000,
+  "sliding_window": 512,
+  "sliding_window_pattern": 6,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.53.0",
+  "unsloth_fixed": true,
+  "unsloth_version": "2025.6.7",
+  "use_cache": true,
+  "vocab_size": 262144
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,14 @@
+{
+  "bos_token_id": 2,
+  "cache_implementation": "hybrid",
+  "do_sample": true,
+  "eos_token_id": [
+    1,
+    106
+  ],
+  "max_length": 32768,
+  "pad_token_id": 0,
+  "top_k": 64,
+  "top_p": 0.95,
+  "transformers_version": "4.53.0"
+}
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:166758c7092a132fd651fd10c5fd25a0c0f7c0f9e58ec9c9b96e132b30ed5c0f
+size 1999811208
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,33 @@
+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<end_of_turn>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/swahili_comprehensive_chart.png
+++ b/swahili_comprehensive_chart.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:c01cebb779ea777195000cd758063ddf62015acc011c2b0eec545f08396967a9
+size 434267
--- a/swahili_gemma_ascending_chart.png
+++ b/swahili_gemma_ascending_chart.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3a1fc1c47716da27e29c4ae052acc3032a204ac9d93a8b0d4c33b6460c255d05
+size 151471
--- a/swahili_results.csv
+++ b/swahili_results.csv
@@ -0,0 +1,17 @@
+Rank,Model,Type,BLEU,chrF++,Our Model
+1,Google Translate,Commercial Service,39.94,66.71,FALSE
+2,Gemini 2.0 Flash 001,Google,35.57,64.62,FALSE
+3,Claude Sonnet 4,Anthropic,34.93,64.27,FALSE
+4,Chatgpt 4o Latest,OpenAI,34.57,64.18,FALSE
+5,Gemini 2.5 Flash,Google,34.16,63.51,FALSE
+6,Gpt 5 Mini,OpenAI,31.75,62.41,FALSE
+7,Llama 4 Scout,Meta,31.46,61.77,FALSE
+8,Llama 4 Maverick,Meta,31.35,61.71,FALSE
+9,Gpt Oss 120B,OpenAI,30.64,60.47,FALSE
+10,Gpt 5 Nano,OpenAI,30.19,61.35,FALSE
+11,Gemma 3 27B,Google,29.38,60.02,FALSE
+12,Swahili Gemma 1B (Our Model),Specialized Fine-tuned,27.59,56.84,TRUE
+13,Gpt Oss 20B,OpenAI,20.13,50.9,FALSE
+14,Gemma 3 4B,Google,10.91,44.1,FALSE
+15,Gemma 3N E4B,Google,6.27,42.47,FALSE
+16,Gemma 3 1B (vLLM Baseline),General Model,2.78,24.79,FALSE
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
+size 33384568
--- a/tokenizer.model
+++ b/tokenizer.model
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
+size 4689074
--- a/tokenizer_config.json
+++ b/tokenizer_config.json