Files

ModelHub XC b00d6d2293 初始化项目，由ModelHub XC社区提供模型

Model: selimc/whisper-large-v3-turbo-turkish
Source: Original Platform

2026-05-14 02:27:35 +08:00

3.1 KiB

Raw Blame History

library_name, language, license, base_model, tags, datasets, metrics, model-index

library_name

language

license

base_model

Whisper Large v3 Turbo TR - Selim Çavaş

This model is a fine-tuned version of openai/whisper-large-v3-turbo on the Common Voice 17.0 dataset. It achieves the following results on the evaluation set:

Loss: 0.3123
Wer: 18.9229

Intended uses & limitations

This model can be used in various application areas, including

Transcription of Turkish language
Voice commands
Automatic subtitling for Turkish videos

How To Use

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "selimc/whisper-large-v3-turbo-turkish"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

result = pipe("test.mp3")
print(result["text"])

Training

Due to colab GPU constraints I was able to train using only the 25% of the Turkish data available in the Common Voice 17.0 dataset. 😔

Got a GPU to spare? Let's collaborate and take this model to the next level! 🚀

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.1223	1.6	1000	0.3187	24.4415
0.0501	3.2	2000	0.3123	20.9720
0.0226	4.8	3000	0.3010	19.6183
0.001	6.4	4000	0.3123	18.9229

Framework versions

Transformers 4.45.2
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.20.1

3.1 KiB Raw Blame History