初始化项目，由ModelHub XC社区提供模型

Model: jacktol/whisper-large-v3-finetuned-for-ATC Source: Original Platform
2026-05-13 18:57:31 +08:00
commit 5cee1ad35c
15 changed files with 333044 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,63 @@
+---
+library_name: transformers
+license: mit
+datasets:
+- jacktol/ATC-ASR-Dataset
+language:
+- en
+metrics:
+- wer
+base_model:
+- openai/whisper-large-v3
+pipeline_tag: automatic-speech-recognition
+model-index:
+  - name: Whisper Large v3 Fine-Tuned for Air Traffic Control (ATC)
+    results:
+      - task:
+          type: automatic-speech-recognition
+        dataset:
+          name: ATC ASR Dataset
+          type: jacktol/ATC-ASR-Dataset
+        metrics:
+          - name: Word Error Rate (WER)
+            type: wer
+            value: 6.5
+
+---
+
+## Model Overview
+
+This model is a fine-tuned version of OpenAI's Whisper Large v3 model, specifically trained on **Air Traffic Control (ATC)** communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, achieving a Word Error Rate (WER) of 6.5% on the test set. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications.
+
+- **Base Model**: OpenAI Large v3
+- **Fine-tuned Model WER**: 6.5%
+
+## Model Description
+
+This fine-tuned model is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from:
+- **[ATC ASR Dataset](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)**
+
+The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks.
+
+## Intended Use
+
+The fine-tuned Whisper model is designed for:
+- **Transcribing aviation communication**: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing.
+- **Air Traffic Control Systems**: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness.
+- **Research and training**: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety.
+
+## Training Procedure
+
+- **Hardware**: Fine-tuning was conducted on two H100 SXM5 GPUs with 80GB VRAM.
+- **Epochs**: 3.25
+- **Learning Rate**: 1e-5
+- **Batch Size**: 10 with no gradient accumulation
+- **Augmentation**: Offline data augmentation techniques were utilized in the training set (Gaussian noise, pitch shifting, etc.).
+- **Evaluation Metric**: Word Error Rate (WER)
+
+## Limitations
+
+While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered or properly represented during training.
+
+## References
+- [**ATC ASR Dataset**](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)