license, language, library_name, thumbnail, tags, datasets, metrics, model-index
license
language
library_name
thumbnail
tags
datasets
metrics
model-index
apache-2.0
fr
transformers
null
automatic-speech-recognition
hf-asr-leaderboard
whisper-event
mozilla-foundation/common_voice_11_0
name
results
Fine-tuned whisper-small model for ASR in French
task
dataset
metrics
name
type
Automatic Speech Recognition
automatic-speech-recognition
name
type
config
split
args
Common Voice 11.0
mozilla-foundation/common_voice_11_0
fr
test
fr
name
type
value
WER (Greedy)
wer
11.76
name
type
value
WER (Beam 5)
wer
10.99
task
dataset
metrics
name
type
Automatic Speech Recognition
automatic-speech-recognition
name
type
config
split
args
Multilingual LibriSpeech (MLS)
facebook/multilingual_librispeech
french
test
french
name
type
value
WER (Greedy)
wer
9.65
name
type
value
WER (Beam 5)
wer
8.91
task
dataset
metrics
name
type
Automatic Speech Recognition
automatic-speech-recognition
name
type
config
split
args
VoxPopuli
facebook/voxpopuli
fr
test
fr
name
type
value
WER (Greedy)
wer
14.45
name
type
value
WER (Beam 5)
wer
13.66
task
dataset
metrics
name
type
Automatic Speech Recognition
automatic-speech-recognition
name
type
config
split
args
Fleurs
google/fleurs
fr_fr
test
fr_fr
name
type
value
WER (Greedy)
wer
10.76
name
type
value
WER (Beam 5)
wer
9.83
task
dataset
metrics
name
type
Automatic Speech Recognition
automatic-speech-recognition
name
type
config
split
args
African Accented French
gigant/african_accented_french
fr
test
fr
name
type
value
WER (Greedy)
wer
10.81
name
type
value
WER (Beam 5)
wer
9.26
<style>
img {
display: inline;
}
</style>
Fine-tuned whisper-small model for ASR in French
This model is a fine-tuned version of openai/whisper-small , trained on the mozilla-foundation/common_voice_11_0 fr dataset. When using the model make sure that your speech input is also sampled at 16Khz. This model also predicts casing and punctuation.
Performance
Below are the WERs of the pre-trained models on the Common Voice 9.0 , Multilingual LibriSpeech , Voxpopuli and Fleurs . These results are reported in the original paper .
Below are the WERs of the fine-tuned models on the Common Voice 11.0 , Multilingual LibriSpeech , Voxpopuli , and Fleurs . Note that these evaluation datasets have been filtered and preprocessed to only contain French alphabet characters and are removed of punctuation outside of apostrophe. The results in the table are reported as WER (greedy search) / WER (beam search with beam width 5).
Usage
Inference with 🤗 Pipeline
Inference with 🤗 low-level APIs