Files

ModelHub XC ddba1294a7 初始化项目，由ModelHub XC社区提供模型

Model: bofenghuang/asr-wav2vec2-ctc-french
Source: Original Platform

2026-05-21 11:36:18 +08:00

6.2 KiB

Raw Permalink Blame History

license, language, library_name, thumbnail, tags, datasets, metrics, model-index

license

language

library_name

thumbnail

tags

datasets

metrics

model-index

apache-2.0

transformers

null

automatic-speech-recognition

hf-asr-leaderboard

robust-speech-event

CTC

Wav2vec2

common_voice

mozilla-foundation/common_voice_11_0

facebook/multilingual_librispeech

facebook/voxpopuli

gigant/african_accented_french

wer

name

results

Fine-tuned wav2vec2-FR-7K-large model for ASR in French

task

dataset

metrics

name	type
Automatic Speech Recognition	automatic-speech-recognition

name	type	args
Common Voice 11.0	mozilla-foundation/common_voice_11_0	fr

name	type	value
Test WER	wer	11.44

name	type	value
Test WER (+LM)	wer	9.66

task

dataset

metrics

name	type
Automatic Speech Recognition	automatic-speech-recognition

name	type	args
Multilingual LibriSpeech (MLS)	facebook/multilingual_librispeech	french

name	type	value
Test WER	wer	5.93

name	type	value
Test WER (+LM)	wer	5.13

task

dataset

metrics

name	type
Automatic Speech Recognition	automatic-speech-recognition

name	type	args
VoxPopuli	facebook/voxpopuli	fr

name	type	value
Test WER	wer	9.33

name	type	value
Test WER (+LM)	wer	8.51

task

dataset

metrics

name	type
Automatic Speech Recognition	automatic-speech-recognition

name	type	args
African Accented French	gigant/african_accented_french	fr

name	type	value
Test WER	wer	16.22

name	type	value
Test WER (+LM)	wer	15.39

task

dataset

metrics

name	type
Automatic Speech Recognition	automatic-speech-recognition

name	type	args
Robust Speech Event - Dev Data	speech-recognition-community-v2/dev_data	fr

name	type	value
Test WER	wer	16.56

name	type	value
Test WER (+LM)	wer	12.96

task

dataset

metrics

name	type
Automatic Speech Recognition	automatic-speech-recognition

name	type	args
Fleurs	google/fleurs	fr_fr

name	type	value
Test WER	wer	10.10

name	type	value
Test WER (+LM)	wer	8.84

Fine-tuned wav2vec2-FR-7K-large model for ASR in French

This model is a fine-tuned version of LeBenchmark/wav2vec2-FR-7K-large, trained on a composite dataset comprising of over 2200 hours of French speech audio, using the train and validation splits of Common Voice 11.0, Multilingual LibriSpeech, Voxpopuli, Multilingual TEDx, MediaSpeech, and African Accented French. When using the model make sure that your speech input is also sampled at 16Khz.

Usage

To use on a local audio file with the language model

import torch
import torchaudio

from transformers import AutoModelForCTC, Wav2Vec2ProcessorWithLM

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = AutoModelForCTC.from_pretrained("bhuang/asr-wav2vec2-french").to(device)
processor_with_lm = Wav2Vec2ProcessorWithLM.from_pretrained("bhuang/asr-wav2vec2-french")
model_sample_rate = processor_with_lm.feature_extractor.sampling_rate

wav_path = "example.wav"  # path to your audio file
waveform, sample_rate = torchaudio.load(wav_path)
waveform = waveform.squeeze(axis=0)  # mono

# resample
if sample_rate != model_sample_rate:
    resampler = torchaudio.transforms.Resample(sample_rate, model_sample_rate)
    waveform = resampler(waveform)

# normalize
input_dict = processor_with_lm(waveform, sampling_rate=model_sample_rate, return_tensors="pt")

with torch.inference_mode():
    logits = model(input_dict.input_values.to(device)).logits

predicted_sentence = processor_with_lm.batch_decode(logits.cpu().numpy()).text[0]

To use on a local audio file without the language model

import torch
import torchaudio

from transformers import AutoModelForCTC, Wav2Vec2Processor

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = AutoModelForCTC.from_pretrained("bhuang/asr-wav2vec2-french").to(device)
processor = Wav2Vec2Processor.from_pretrained("bhuang/asr-wav2vec2-french")
model_sample_rate = processor.feature_extractor.sampling_rate

wav_path = "example.wav"  # path to your audio file
waveform, sample_rate = torchaudio.load(wav_path)
waveform = waveform.squeeze(axis=0)  # mono

# resample
if sample_rate != model_sample_rate:
    resampler = torchaudio.transforms.Resample(sample_rate, model_sample_rate)
    waveform = resampler(waveform)

# normalize
input_dict = processor(waveform, sampling_rate=model_sample_rate, return_tensors="pt")

with torch.inference_mode():
    logits = model(input_dict.input_values.to(device)).logits

# decode
predicted_ids = torch.argmax(logits, dim=-1)
predicted_sentence = processor.batch_decode(predicted_ids)[0]

Evaluation

To evaluate on mozilla-foundation/common_voice_11_0

python eval.py \
  --model_id "bhuang/asr-wav2vec2-french" \
  --dataset "mozilla-foundation/common_voice_11_0" \
  --config "fr" \
  --split "test" \
  --log_outputs \
  --outdir "outputs/results_mozilla-foundatio_common_voice_11_0_with_lm"

To evaluate on speech-recognition-community-v2/dev_data

python eval.py \
  --model_id "bhuang/asr-wav2vec2-french" \
  --dataset "speech-recognition-community-v2/dev_data" \
  --config "fr" \
  --split "validation" \
  --chunk_length_s 30.0 \
  --stride_length_s 5.0 \
  --log_outputs \
  --outdir "outputs/results_speech-recognition-community-v2_dev_data_with_lm"

6.2 KiB Raw Permalink Blame History

Fine-tuned wav2vec2-FR-7K-large model for ASR in French

Usage

Evaluation

6.2 KiB

Raw Permalink Blame History