--- library_name: transformers license: mit datasets: - jacktol/ATC-ASR-Dataset language: - en metrics: - wer base_model: - openai/whisper-large-v3 pipeline_tag: automatic-speech-recognition model-index: - name: Whisper Large v3 Fine-Tuned for Air Traffic Control (ATC) results: - task: type: automatic-speech-recognition dataset: name: ATC ASR Dataset type: jacktol/ATC-ASR-Dataset metrics: - name: Word Error Rate (WER) type: wer value: 6.5 --- ## Model Overview This model is a fine-tuned version of OpenAI's Whisper Large v3 model, specifically trained on **Air Traffic Control (ATC)** communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, achieving a Word Error Rate (WER) of 6.5% on the test set. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications. - **Base Model**: OpenAI Large v3 - **Fine-tuned Model WER**: 6.5% ## Model Description This fine-tuned model is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from: - **[ATC ASR Dataset](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)** The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks. ## Intended Use The fine-tuned Whisper model is designed for: - **Transcribing aviation communication**: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing. - **Air Traffic Control Systems**: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness. - **Research and training**: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety. ## Training Procedure - **Hardware**: Fine-tuning was conducted on two H100 SXM5 GPUs with 80GB VRAM. - **Epochs**: 3.25 - **Learning Rate**: 1e-5 - **Batch Size**: 10 with no gradient accumulation - **Augmentation**: Offline data augmentation techniques were utilized in the training set (Gaussian noise, pitch shifting, etc.). - **Evaluation Metric**: Word Error Rate (WER) ## Limitations While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered or properly represented during training. ## References - [**ATC ASR Dataset**](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)