Files
whisper-large-v3-turbo-turkish/README.md
ModelHub XC b00d6d2293 初始化项目,由ModelHub XC社区提供模型
Model: selimc/whisper-large-v3-turbo-turkish
Source: Original Platform
2026-05-14 02:27:35 +08:00

3.1 KiB

library_name, language, license, base_model, tags, datasets, metrics, model-index
library_name language license base_model tags datasets metrics model-index
transformers
tr
mit openai/whisper-large-v3-turbo
generated_from_trainer
mozilla-foundation/common_voice_17_0
wer
name results
Whisper Large v3 Turbo TR - Selim Çavaş
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type config split args
Common Voice 17.0 mozilla-foundation/common_voice_17_0 tr test config: tr, split: test
name type value
Wer wer 18.92291759135967

Whisper Large v3 Turbo TR - Selim Çavaş

This model is a fine-tuned version of openai/whisper-large-v3-turbo on the Common Voice 17.0 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3123
  • Wer: 18.9229

Intended uses & limitations

This model can be used in various application areas, including

  • Transcription of Turkish language
  • Voice commands
  • Automatic subtitling for Turkish videos

How To Use

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "selimc/whisper-large-v3-turbo-turkish"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

result = pipe("test.mp3")
print(result["text"])

Training

Due to colab GPU constraints I was able to train using only the 25% of the Turkish data available in the Common Voice 17.0 dataset. 😔

Got a GPU to spare? Let's collaborate and take this model to the next level! 🚀

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.1223 1.6 1000 0.3187 24.4415
0.0501 3.2 2000 0.3123 20.9720
0.0226 4.8 3000 0.3010 19.6183
0.001 6.4 4000 0.3123 18.9229

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.1