whisper-large-v3-finetuned-for-ATC/README.md at 5cee1ad35c4fe9e2edef391787a9daf1479e0bff

jacktol/whisper-large-v3-finetuned-for-ATC

Files

ModelHub XC 5cee1ad35c 初始化项目，由ModelHub XC社区提供模型

Model: jacktol/whisper-large-v3-finetuned-for-ATC
Source: Original Platform

2026-05-13 18:57:31 +08:00

2.9 KiB

Raw Blame History

library_name, license, datasets, language, metrics, base_model, pipeline_tag, model-index

library_name

license

datasets

language

metrics

base_model

pipeline_tag

model-index

transformers

mit

jacktol/ATC-ASR-Dataset

wer

openai/whisper-large-v3

automatic-speech-recognition

name

results

Whisper Large v3 Fine-Tuned for Air Traffic Control (ATC)

task

dataset

metrics

type
automatic-speech-recognition

name	type
ATC ASR Dataset	jacktol/ATC-ASR-Dataset

name	type	value
Word Error Rate (WER)	wer	6.5

Model Overview

This model is a fine-tuned version of OpenAI's Whisper Large v3 model, specifically trained on Air Traffic Control (ATC) communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, achieving a Word Error Rate (WER) of 6.5% on the test set. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications.

Base Model: OpenAI Large v3
Fine-tuned Model WER: 6.5%

Model Description

This fine-tuned model is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from:

ATC ASR Dataset

The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks.

Intended Use

The fine-tuned Whisper model is designed for:

Transcribing aviation communication: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing.
Air Traffic Control Systems: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness.
Research and training: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety.

Training Procedure

Hardware: Fine-tuning was conducted on two H100 SXM5 GPUs with 80GB VRAM.
Epochs: 3.25
Learning Rate: 1e-5
Batch Size: 10 with no gradient accumulation
Augmentation: Offline data augmentation techniques were utilized in the training set (Gaussian noise, pitch shifting, etc.).
Evaluation Metric: Word Error Rate (WER)

Limitations

While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered or properly represented during training.

References

ATC ASR Dataset

2.9 KiB Raw Blame History