初始化项目，由ModelHub XC社区提供模型

Model: CraneAILabs/swahili-gemma-1b Source: Original Platform
2026-05-17 22:37:29 +08:00
commit 354ce0f3a5
15 changed files with 51895 additions and 0 deletions
--- a/EVALUATION.md
+++ b/EVALUATION.md
@@ -0,0 +1,77 @@
+# Comprehensive FLORES Translation Evaluation Results
+
+## Overview
+This package contains comprehensive evaluation results for English→Luganda and English→Swahili translation using the FLORES+ dataset. The evaluation includes specialized fine-tuned models, commercial services, and baseline models.
+
+## Contents
+
+### 📊 Charts (`/charts/`)
+- `luganda_comprehensive_chart.png` - Complete Luganda translation performance comparison (17 models)
+- `swahili_comprehensive_chart.png` - Complete Swahili translation performance comparison (16 models)
+
+### 📈 Data (`/data/`)
+- `luganda_results.csv` - Detailed Luganda evaluation results with rankings
+- `swahili_results.csv` - Detailed Swahili evaluation results with rankings
+- `summary.csv` - Executive summary of our models' performance
+
+## Key Results
+
+### 🏆 Our Models Performance
+
+| Language | Model | Rank | BLEU | chrF++ | Percentile | Efficiency (BLEU/B) |
+|----------|-------|------|------|--------|------------|---------------------|
+| **Luganda** | Ganda Gemma 1B | 5/17 | 6.99 | 40.32 | 76.5% | 6.99 |
+| **Swahili** | Swahili Gemma 1B | 12/16 | 27.59 | 56.84 | 31.2% | 27.59 |
+
+### 🎯 Key Insights
+
+**Language Resource Impact:**
+- **Swahili** significantly outperforms **Luganda** (27.59 vs 6.99 BLEU)
+- Reflects the resource availability gap between the two languages
+- Demonstrates the challenge of low-resource language translation
+
+**Competitive Standing:**
+- **Luganda**: Ranks 5th out of 17 models (76.5th percentile)
+- **Swahili**: Ranks 12th out of 16 models (31.2nd percentile)
+- Both models show excellent parameter efficiency
+
+**Baseline Comparison:**
+- Our specialized models vastly outperform the general Gemma-3-1B baseline
+- **Luganda**: 6.99 vs 0.51 BLEU (13.8x improvement)
+- **Swahili**: 27.59 vs 2.78 BLEU (9.9x improvement)
+
+## Methodology
+
+**Dataset:** FLORES+ devtest split (1,012 sentence pairs per language)
+**Metrics:** BLEU and chrF++ scores
+**Evaluation:** Comprehensive comparison across 17 different models/services
+**Baseline:** vLLM-served Gemma-3-1B-IT for fair comparison
+
+## Models Evaluated
+
+### Commercial Services
+- Google Translate (top performer in both languages)
+
+### Specialized Models (Ours)
+- Ganda Gemma 1B (fine-tuned for Luganda)
+- Swahili Gemma 1B (fine-tuned for Swahili)
+
+### General Models
+- Claude Sonnet 4, GPT variants, Gemini models, Llama models
+- Gemma-3-1B baseline (vLLM)
+
+## Files Description
+
+### Data Files
+- **CSV Structure**: Rank, Model, Type, Parameters (B), BLEU, chrF++, BLEU per Billion Params, Our Model
+- **Rankings**: Sorted by BLEU score (descending)
+- **Efficiency**: BLEU score per billion parameters for fair comparison
+
+### Charts
+- **Visual comparison** of all models with our models highlighted
+- **Color coding**: Red (BLEU), Black (chrF++)
+- **Special marking**: Diagonal stripes for our models
+
+---
+
+*Evaluation Framework: FLORES+ English→African Languages*