初始化项目,由ModelHub XC社区提供模型
Model: jacktol/whisper-large-v3-finetuned-for-ATC Source: Original Platform
This commit is contained in:
63
README.md
Normal file
63
README.md
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: mit
|
||||
datasets:
|
||||
- jacktol/ATC-ASR-Dataset
|
||||
language:
|
||||
- en
|
||||
metrics:
|
||||
- wer
|
||||
base_model:
|
||||
- openai/whisper-large-v3
|
||||
pipeline_tag: automatic-speech-recognition
|
||||
model-index:
|
||||
- name: Whisper Large v3 Fine-Tuned for Air Traffic Control (ATC)
|
||||
results:
|
||||
- task:
|
||||
type: automatic-speech-recognition
|
||||
dataset:
|
||||
name: ATC ASR Dataset
|
||||
type: jacktol/ATC-ASR-Dataset
|
||||
metrics:
|
||||
- name: Word Error Rate (WER)
|
||||
type: wer
|
||||
value: 6.5
|
||||
|
||||
---
|
||||
|
||||
## Model Overview
|
||||
|
||||
This model is a fine-tuned version of OpenAI's Whisper Large v3 model, specifically trained on **Air Traffic Control (ATC)** communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, achieving a Word Error Rate (WER) of 6.5% on the test set. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications.
|
||||
|
||||
- **Base Model**: OpenAI Large v3
|
||||
- **Fine-tuned Model WER**: 6.5%
|
||||
|
||||
## Model Description
|
||||
|
||||
This fine-tuned model is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from:
|
||||
- **[ATC ASR Dataset](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)**
|
||||
|
||||
The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks.
|
||||
|
||||
## Intended Use
|
||||
|
||||
The fine-tuned Whisper model is designed for:
|
||||
- **Transcribing aviation communication**: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing.
|
||||
- **Air Traffic Control Systems**: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness.
|
||||
- **Research and training**: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety.
|
||||
|
||||
## Training Procedure
|
||||
|
||||
- **Hardware**: Fine-tuning was conducted on two H100 SXM5 GPUs with 80GB VRAM.
|
||||
- **Epochs**: 3.25
|
||||
- **Learning Rate**: 1e-5
|
||||
- **Batch Size**: 10 with no gradient accumulation
|
||||
- **Augmentation**: Offline data augmentation techniques were utilized in the training set (Gaussian noise, pitch shifting, etc.).
|
||||
- **Evaluation Metric**: Word Error Rate (WER)
|
||||
|
||||
## Limitations
|
||||
|
||||
While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered or properly represented during training.
|
||||
|
||||
## References
|
||||
- [**ATC ASR Dataset**](https://huggingface.co/datasets/jacktol/ATC-ASR-Dataset)
|
||||
Reference in New Issue
Block a user