61 lines
3.2 KiB
Markdown
61 lines
3.2 KiB
Markdown
|
|
---
|
|||
|
|
library_name: transformers
|
|||
|
|
license: cc-by-nc-4.0
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
- zh
|
|||
|
|
metrics:
|
|||
|
|
- cer
|
|||
|
|
pipeline_tag: automatic-speech-recognition
|
|||
|
|
base_model:
|
|||
|
|
- openai/whisper-large-v3-turbo
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
[ [繁體中文 README.md](https://huggingface.co/NUTN-KWS/Whisper-Taiwanese-model-v0.5) ]
|
|||
|
|
|
|||
|
|
# 👳 Whisper-Taiwanese model V0.5 (Tv0.5)
|
|||
|
|
|
|||
|
|
This model is a fine-tuned version of OpenAI’s [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo). It was developed by the National University of Tainan (NUTN), Taiwan, as part of a National Science and Technology Council (NSTC)-funded industry-academia collaboration project. We carried out the Taiwanese-English Co-Learning Pilot Project from September 2024 to June 2025 in collaboration with JEN-PIN ENTERPRISE CO., LTD. The model is trained for Taiwanese language recognition tasks using JEN-PIN educational materials generated through Student–Machine Co-Learning during the Fall 2024 semester. Additionally, the NUTN is collaborating with the National Center for High-performance Computing (NCHC) of the National Applied Research Laboratories (NARLabs) in Taiwan to provide computational and storage resources and co-develop an AI learning model for elementary and high school students.
|
|||
|
|
|
|||
|
|
Demo: [https://kws.oaselab.org/taigitong/](https://kws.oaselab.org/taigitong/)
|
|||
|
|
|
|||
|
|
## 📝 Model Details
|
|||
|
|
- **Base Model**: `openai/whisper-large-v3-turbo`
|
|||
|
|
- **Fine-tuned for**: Taiwanese Hokkien Automatic Speech Recognition (ASR)
|
|||
|
|
- **Fine-tuning Framework**: Hugging Face Transformers
|
|||
|
|
- **Training Duration**: Approximately 180 hours using two V100 GPUs
|
|||
|
|
- **Dataset**: Custom dataset, including the Dictionary of Frequently-Used Taiwanese Taigi released by the Ministry of Education, Taiwan, totaling approximately 90 hours of audio data.
|
|||
|
|
- **Input Format**: 16kHz mono WAV
|
|||
|
|
- **License**: CC BY-NC 4.0
|
|||
|
|
|
|||
|
|
## 🚀 Usage
|
|||
|
|
### Installing Packages:
|
|||
|
|
```bash
|
|||
|
|
pip install torch torchvision torchaudio transformers
|
|||
|
|
```
|
|||
|
|
### Example:
|
|||
|
|
```python
|
|||
|
|
from transformers import pipeline
|
|||
|
|
|
|||
|
|
pipe = pipeline("automatic-speech-recognition", model="./model/whisper-taiwanese", device=0)
|
|||
|
|
result = pipe("audio.wav", generate_kwargs={"language": "zh", "task": "transcribe"})
|
|||
|
|
print(result["text"])
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 👨🎓 Citation
|
|||
|
|
|
|||
|
|
### BibTeX:
|
|||
|
|
```bibtex
|
|||
|
|
@misc{taiwanesewhisperasr2025,
|
|||
|
|
title={Taiwanese Whisper ASR},
|
|||
|
|
author={KWS Center, National University of Tainan, Taiwan},
|
|||
|
|
year={2025},
|
|||
|
|
url={https://huggingface.co/NUTN-KWS/Whisper-Taiwanese-model-v0.5}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### APA:
|
|||
|
|
- C. S. Lee, M. H. Wang, C. C. Yue, G. Y. Teseng, and Y. Nojima, "Fuzzy Estimation Agent with Knowledge Graph and Quantum Fuzzy Inference Engine for Taiwanese-English Co-Learning," 2025 IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS 2025), Banff, Alberta, Canada, Aug. 16-19, 2025.
|
|||
|
|
- C. S. Lee, M. H. Wang, C. Y. Chen, S. C. Yang, M. Reformat, N. Kubota, and A. Pourabdollah, "Integrating quantum CI and generative AI for Taiwanese/English co-learning," Quantum Machine Intelligence, vol. 6, 64, pp. 1-19, 2024.
|
|||
|
|
- C. S. Lee, M. H. Wang, C. Y. Chen, S. C. Yang, M. Reformat, N. Kubota, and A. Pourabdollah, "Quantum fuzzy inference engine with generative AI and TAIDE KG for Taiwanese/English co-learning," 2025 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2025), Reims, France, Jul. 6-9, 2025.
|