Files
ModelHub XC fe20ba2ad8 初始化项目,由ModelHub XC社区提供模型
Model: projecte-aina/whisper-large-v3-ca-3catparla
Source: Original Platform
2026-05-13 17:57:28 +08:00

11 KiB

language, datasets, tags, license, model-index, library_name
language datasets tags license model-index library_name
ca
projecte-aina/3catparla_asr
audio
automatic-speech-recognition
catalan
whisper-large-v3
projecte-aina
barcelona-supercomputing-center
bsc
apache-2.0
name results
whisper-large-v3-ca-3catparla
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
3CatParla (Test) projecte-aina/3catparla_asr test
language
ca
name type value
WER wer 0.96
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
3CatParla (Dev) projecte-aina/3catparla_asr dev
language
ca
name type value
WER wer 0.92
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
Mozilla Common Voice 17.0 (Test) mozilla-foundation/common_voice_17_0 test
language
ca
name type value
WER wer 10.32
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
Mozilla Common Voice 17.0 (Dev) mozilla-foundation/common_voice_17_0 validation
language
ca
name type value
WER wer 9.26
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
CV Benchmark Catalan Accents (Balearic fem) projecte-aina/commonvoice_benchmark_catalan_accents Balearic female
language
ca
name type value
WER wer 12.25
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
CV Benchmark Catalan Accents (Balearic male) projecte-aina/commonvoice_benchmark_catalan_accents Balearic male
language
ca
name type value
WER wer 12.18
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
CV Benchmark Catalan Accents (Central fem) projecte-aina/commonvoice_benchmark_catalan_accents Central female
language
ca
name type value
WER wer 8.51
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
CV Benchmark Catalan Accents (Central male) projecte-aina/commonvoice_benchmark_catalan_accents Central male
language
ca
name type value
WER wer 8.73
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
CV Benchmark Catalan Accents (Northern fem) projecte-aina/commonvoice_benchmark_catalan_accents Northern female
language
ca
name type value
WER wer 8.09
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
CV Benchmark Catalan Accents (Northern male) projecte-aina/commonvoice_benchmark_catalan_accents Northern male
language
ca
name type value
WER wer 8.28
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
CV Benchmark Catalan Accents (Northwestern fem) projecte-aina/commonvoice_benchmark_catalan_accents Northwestern female
language
ca
name type value
WER wer 7.88
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
CV Benchmark Catalan Accents (Northwestern male) projecte-aina/commonvoice_benchmark_catalan_accents Northwestern male
language
ca
name type value
WER wer 8.44
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
CV Benchmark Catalan Accents (Valencian fem) projecte-aina/commonvoice_benchmark_catalan_accents Valencian female
language
ca
name type value
WER wer 9.58
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type split args
CV Benchmark Catalan Accents (Valencian male) projecte-aina/commonvoice_benchmark_catalan_accents Valencian male
language
ca
name type value
WER wer 9.1
transformers

whisper-large-v3-ca-3catparla

Table of Contents

Click to expand

Paper

PDF: 3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition

Model Description

The "whisper-large-v3-ca-3catparla" is an acoustic model suitable for Automatic Speech Recognition in Catalan. It is the result of finetuning the model "openai/whisper-large-v3" with 710 hours of Catalan data released by the Projecte AINA from Barcelona, Spain.

Intended Uses and Limitations

This model can be used for Automatic Speech Recognition (ASR) in Catalan. The model is intended to transcribe audio files in Catalan to plain text without punctuation.

How to Get Started with the Model

To see an updated and functional version of this code, please see our our Notebook

Installation

In order to use this model, you may install datasets and transformers:

Create a virtual environment:

python -m venv /path/to/venv

Activate the environment:

source /path/to/venv/bin/activate

Install the modules:

pip install datasets transformers 

For Inference

In order to transcribe audio in Catalan using this model, you can follow this example:

#Install Prerequisites
pip install torch
pip install datasets
pip install 'transformers[torch]'
pip install evaluate
pip install jiwer
#This code works with GPU

#Notice that: load_metric is no longer part of datasets.
#you have to remove it and use evaluate's load instead.
#(Note from November 2024)

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor

#Load the processor and model.
MODEL_NAME="projecte-aina/whisper-large-v3-ca-3catparla"
processor = WhisperProcessor.from_pretrained(MODEL_NAME)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")

#Load the dataset
from datasets import load_dataset, load_metric, Audio
ds=load_dataset("projecte-aina/3catparla_asr",split='test')

#Downsample to 16kHz
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))

#Process the dataset
def map_to_pred(batch):
	audio = batch["audio"]
	input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
	batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])

	with torch.no_grad():
		predicted_ids = model.generate(input_features.to("cuda"))[0]
	
	transcription = processor.decode(predicted_ids)
	batch["prediction"] = processor.tokenizer._normalize(transcription)
	
	return batch
	
#Do the evaluation
result = ds.map(map_to_pred)

#Compute the overall WER now.
from evaluate import load

wer = load("wer")
WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
print(WER)

Test Result: 0.96

Training Details

Training data

The specific dataset used to create the model is called "3CatParla".

Training procedure

This model is the result of finetuning the model "openai/whisper-large-v3" by following this tutorial provided by Hugging Face.

Training Hyperparameters

  • language: catalan
  • hours of training audio: 710
  • learning rate: 1.95e-07
  • sample rate: 16000
  • train batch size: 32 (x4 GPUs)
    • gradient accumulation steps: 1
  • eval batch size: 32
  • save total limit: 3
  • max steps: 19842
  • warmup steps: 1984
  • eval steps: 3307
  • save steps: 3307
  • shuffle buffer size: 480

Citation

If this model contributes to your research, please cite the work:

@inproceedings{hernandez20243catparla,
  title={3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition},
  author={Hern{\'a}ndez Mena, Carlos Daniel and Armentano Oller, Carme and Solito, Sarah and K{\"u}lebi, Baybars},
  booktitle={Proc. IberSPEECH 2024},
  pages={176--180},
  year={2024}
}

Additional Information

Author

The fine-tuning process was perform during July (2024) in the Language Technologies Unit of the Barcelona Supercomputing Center by Carlos Daniel Hernández Mena.

Contact

For further information, please send an email to langtech@bsc.es.

Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.

License

Apache-2.0

Funding

This work has been promoted and financed by the Generalitat de Catalunya through the Aina project.

The training of the model was possible thanks to the compute time provided by Barcelona Supercomputing Center through MareNostrum 5.