初始化项目,由ModelHub XC社区提供模型
Model: CraneAILabs/swahili-gemma-1b Source: Original Platform
This commit is contained in:
77
EVALUATION.md
Normal file
77
EVALUATION.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# Comprehensive FLORES Translation Evaluation Results
|
||||
|
||||
## Overview
|
||||
This package contains comprehensive evaluation results for English→Luganda and English→Swahili translation using the FLORES+ dataset. The evaluation includes specialized fine-tuned models, commercial services, and baseline models.
|
||||
|
||||
## Contents
|
||||
|
||||
### 📊 Charts (`/charts/`)
|
||||
- `luganda_comprehensive_chart.png` - Complete Luganda translation performance comparison (17 models)
|
||||
- `swahili_comprehensive_chart.png` - Complete Swahili translation performance comparison (16 models)
|
||||
|
||||
### 📈 Data (`/data/`)
|
||||
- `luganda_results.csv` - Detailed Luganda evaluation results with rankings
|
||||
- `swahili_results.csv` - Detailed Swahili evaluation results with rankings
|
||||
- `summary.csv` - Executive summary of our models' performance
|
||||
|
||||
## Key Results
|
||||
|
||||
### 🏆 Our Models Performance
|
||||
|
||||
| Language | Model | Rank | BLEU | chrF++ | Percentile | Efficiency (BLEU/B) |
|
||||
|----------|-------|------|------|--------|------------|---------------------|
|
||||
| **Luganda** | Ganda Gemma 1B | 5/17 | 6.99 | 40.32 | 76.5% | 6.99 |
|
||||
| **Swahili** | Swahili Gemma 1B | 12/16 | 27.59 | 56.84 | 31.2% | 27.59 |
|
||||
|
||||
### 🎯 Key Insights
|
||||
|
||||
**Language Resource Impact:**
|
||||
- **Swahili** significantly outperforms **Luganda** (27.59 vs 6.99 BLEU)
|
||||
- Reflects the resource availability gap between the two languages
|
||||
- Demonstrates the challenge of low-resource language translation
|
||||
|
||||
**Competitive Standing:**
|
||||
- **Luganda**: Ranks 5th out of 17 models (76.5th percentile)
|
||||
- **Swahili**: Ranks 12th out of 16 models (31.2nd percentile)
|
||||
- Both models show excellent parameter efficiency
|
||||
|
||||
**Baseline Comparison:**
|
||||
- Our specialized models vastly outperform the general Gemma-3-1B baseline
|
||||
- **Luganda**: 6.99 vs 0.51 BLEU (13.8x improvement)
|
||||
- **Swahili**: 27.59 vs 2.78 BLEU (9.9x improvement)
|
||||
|
||||
## Methodology
|
||||
|
||||
**Dataset:** FLORES+ devtest split (1,012 sentence pairs per language)
|
||||
**Metrics:** BLEU and chrF++ scores
|
||||
**Evaluation:** Comprehensive comparison across 17 different models/services
|
||||
**Baseline:** vLLM-served Gemma-3-1B-IT for fair comparison
|
||||
|
||||
## Models Evaluated
|
||||
|
||||
### Commercial Services
|
||||
- Google Translate (top performer in both languages)
|
||||
|
||||
### Specialized Models (Ours)
|
||||
- Ganda Gemma 1B (fine-tuned for Luganda)
|
||||
- Swahili Gemma 1B (fine-tuned for Swahili)
|
||||
|
||||
### General Models
|
||||
- Claude Sonnet 4, GPT variants, Gemini models, Llama models
|
||||
- Gemma-3-1B baseline (vLLM)
|
||||
|
||||
## Files Description
|
||||
|
||||
### Data Files
|
||||
- **CSV Structure**: Rank, Model, Type, Parameters (B), BLEU, chrF++, BLEU per Billion Params, Our Model
|
||||
- **Rankings**: Sorted by BLEU score (descending)
|
||||
- **Efficiency**: BLEU score per billion parameters for fair comparison
|
||||
|
||||
### Charts
|
||||
- **Visual comparison** of all models with our models highlighted
|
||||
- **Color coding**: Red (BLEU), Black (chrF++)
|
||||
- **Special marking**: Diagonal stripes for our models
|
||||
|
||||
---
|
||||
|
||||
*Evaluation Framework: FLORES+ English→African Languages*
|
||||
Reference in New Issue
Block a user