whisper-large-v3-finetuned-…/README.md

---
library_name: transformers
license: mit
datasets:
- jacktol/ATC-ASR-Dataset
language:
- en
metrics:
- wer
base_model:
- openai/whisper-large-v3
pipeline_tag: automatic-speech-recognition
model-index:
  - name: Whisper Large v3 Fine-Tuned for Air Traffic Control (ATC)
    results:
      - task:
          type: automatic-speech-recognition
        dataset:
          name: ATC ASR Dataset
          type: jacktol/ATC-ASR-Dataset
        metrics:
          - name: Word Error Rate (WER)
            type: wer
            value: 6.5

---

## Model Overview

This model is a fine-tuned version of OpenAI's Whisper Large v3 model, specifically trained on **Air Traffic Control (ATC)** communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, achieving a Word Error Rate (WER) of 6.5% on the test set. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications.

- **Base Model**: OpenAI Large v3
- **Fine-tuned Model WER**: 6.5%

## Model Description

This fine-tuned model is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from:
- **[ATC ASR Dataset](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)**

The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks.

## Intended Use

The fine-tuned Whisper model is designed for:
- **Transcribing aviation communication**: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing.
- **Air Traffic Control Systems**: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness.
- **Research and training**: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety.

## Training Procedure

- **Hardware**: Fine-tuning was conducted on two H100 SXM5 GPUs with 80GB VRAM.
- **Epochs**: 3.25
- **Learning Rate**: 1e-5
- **Batch Size**: 10 with no gradient accumulation
- **Augmentation**: Offline data augmentation techniques were utilized in the training set (Gaussian noise, pitch shifting, etc.).
- **Evaluation Metric**: Word Error Rate (WER)

## Limitations

While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered or properly represented during training.

## References
- [**ATC ASR Dataset**](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)
初始化项目，由ModelHub XC社区提供模型 Model: jacktol/whisper-large-v3-finetuned-for-ATC Source: Original Platform 2026-05-13 18:57:31 +08:00			`---`
			`library_name: transformers`
			`license: mit`
			`datasets:`
			`- jacktol/ATC-ASR-Dataset`
			`language:`
			`- en`
			`metrics:`
			`- wer`
			`base_model:`
			`- openai/whisper-large-v3`
			`pipeline_tag: automatic-speech-recognition`
			`model-index:`
			`- name: Whisper Large v3 Fine-Tuned for Air Traffic Control (ATC)`
			`results:`
			`- task:`
			`type: automatic-speech-recognition`
			`dataset:`
			`name: ATC ASR Dataset`
			`type: jacktol/ATC-ASR-Dataset`
			`metrics:`
			`- name: Word Error Rate (WER)`
			`type: wer`
			`value: 6.5`

			`---`

			`## Model Overview`

			`This model is a fine-tuned version of OpenAI's Whisper Large v3 model, specifically trained on Air Traffic Control (ATC) communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, achieving a Word Error Rate (WER) of 6.5% on the test set. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications.`

			`- Base Model: OpenAI Large v3`
			`- Fine-tuned Model WER: 6.5%`

			`## Model Description`

			`This fine-tuned model is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from:`
			`- [ATC ASR Dataset](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)`

			`The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks.`

			`## Intended Use`

			`The fine-tuned Whisper model is designed for:`
			`- Transcribing aviation communication: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing.`
			`- Air Traffic Control Systems: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness.`
			`- Research and training: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety.`

			`## Training Procedure`

			`- Hardware: Fine-tuning was conducted on two H100 SXM5 GPUs with 80GB VRAM.`
			`- Epochs: 3.25`
			`- Learning Rate: 1e-5`
			`- Batch Size: 10 with no gradient accumulation`
			`- Augmentation: Offline data augmentation techniques were utilized in the training set (Gaussian noise, pitch shifting, etc.).`
			`- Evaluation Metric: Word Error Rate (WER)`

			`## Limitations`

			`While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered or properly represented during training.`

			`## References`
			`- [ATC ASR Dataset](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)`