Go to file

ModelHub XC 28deb980b6 初始化项目，由ModelHub XC社区提供模型

Model: NUTN-KWS/Whisper-Taiwanese-model-v0.5
Source: Original Platform

2026-05-13 02:11:35 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

normalizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

preprocessor_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

README_EN.md

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

training_args.bin

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-13 02:11:35 +08:00

README_EN.md

library_name, license, language, metrics, pipeline_tag, base_model

library_name

license

language

metrics

pipeline_tag

base_model

transformers

cc-by-nc-4.0

cer

automatic-speech-recognition

openai/whisper-large-v3-turbo

[ 繁體中文 README.md ]

👳 Whisper-Taiwanese model V0.5 (Tv0.5)

This model is a fine-tuned version of OpenAI’s openai/whisper-large-v3-turbo. It was developed by the National University of Tainan (NUTN), Taiwan, as part of a National Science and Technology Council (NSTC)-funded industry-academia collaboration project. We carried out the Taiwanese-English Co-Learning Pilot Project from September 2024 to June 2025 in collaboration with JEN-PIN ENTERPRISE CO., LTD. The model is trained for Taiwanese language recognition tasks using JEN-PIN educational materials generated through Student–Machine Co-Learning during the Fall 2024 semester. Additionally, the NUTN is collaborating with the National Center for High-performance Computing (NCHC) of the National Applied Research Laboratories (NARLabs) in Taiwan to provide computational and storage resources and co-develop an AI learning model for elementary and high school students.

Demo: https://kws.oaselab.org/taigitong/

📝 Model Details

Base Model: openai/whisper-large-v3-turbo
Fine-tuned for: Taiwanese Hokkien Automatic Speech Recognition (ASR)
Fine-tuning Framework: Hugging Face Transformers
Training Duration: Approximately 180 hours using two V100 GPUs
Dataset: Custom dataset, including the Dictionary of Frequently-Used Taiwanese Taigi released by the Ministry of Education, Taiwan, totaling approximately 90 hours of audio data.
Input Format: 16kHz mono WAV
License: CC BY-NC 4.0

🚀 Usage

Installing Packages:

pip install torch torchvision torchaudio transformers

Example:

from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model="./model/whisper-taiwanese", device=0)
result = pipe("audio.wav", generate_kwargs={"language": "zh", "task": "transcribe"})
print(result["text"])

👨‍🎓 Citation

BibTeX:

@misc{taiwanesewhisperasr2025,
  title={Taiwanese Whisper ASR},
  author={KWS Center, National University of Tainan, Taiwan},
  year={2025},
  url={https://huggingface.co/NUTN-KWS/Whisper-Taiwanese-model-v0.5}
}

APA:

C. S. Lee, M. H. Wang, C. C. Yue, G. Y. Teseng, and Y. Nojima, "Fuzzy Estimation Agent with Knowledge Graph and Quantum Fuzzy Inference Engine for Taiwanese-English Co-Learning," 2025 IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS 2025), Banff, Alberta, Canada, Aug. 16-19, 2025.
C. S. Lee, M. H. Wang, C. Y. Chen, S. C. Yang, M. Reformat, N. Kubota, and A. Pourabdollah, "Integrating quantum CI and generative AI for Taiwanese/English co-learning," Quantum Machine Intelligence, vol. 6, 64, pp. 1-19, 2024.
C. S. Lee, M. H. Wang, C. Y. Chen, S. C. Yang, M. Reformat, N. Kubota, and A. Pourabdollah, "Quantum fuzzy inference engine with generative AI and TAIDE KG for Taiwanese/English co-learning," 2025 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2025), Reims, France, Jul. 6-9, 2025.

README_EN.md Unescape Escape

👳 Whisper-Taiwanese model V0.5 (Tv0.5)

📝 Model Details

🚀 Usage

Installing Packages:

Example:

👨‍🎓 Citation

BibTeX:

APA:

README_EN.md