初始化项目,由ModelHub XC社区提供模型
Model: scottykwok/wav2vec2-large-xlsr-cantonese Source: Original Platform
This commit is contained in:
53
README.md
Normal file
53
README.md
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
language: zh
|
||||
tags:
|
||||
- automatic-speech-recognition
|
||||
license: cc-by-sa-4.0
|
||||
datasets:
|
||||
- common_voice
|
||||
metrics:
|
||||
- cer
|
||||
---
|
||||
|
||||
# Wav2vec2-large-xlsr-cantonese
|
||||
This model was based on [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53), finetuned using Common Voice/zh-HK/6.1.0.
|
||||
|
||||
The training code is similar to [user ctl](https://huggingface.co/ctl/wav2vec2-large-xlsr-cantonese), except that the number of training epochs was 80 (doubled) and fp16_backend is apex. The model was trained using a single RTX 3090 and docker image is nvidia/cuda:11.1-cudnn8-devel.
|
||||
|
||||
CER is 15.11% when evaluate against common voice zh-HK test set.
|
||||
|
||||
# Result (CER)
|
||||
15.11%
|
||||
|
||||
# Source Code
|
||||
See this GitHub Repo [cantonese-selfish-project](https://github.com/scottykwok/cantonese-selfish-project/) and [demo video](https://youtu.be/k_9RQ-ilGEc).
|
||||
|
||||
# Usage
|
||||
```python
|
||||
import soundfile as sf
|
||||
import torch
|
||||
from datasets import load_dataset
|
||||
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
||||
|
||||
# load pretrained model
|
||||
processor = Wav2Vec2Processor.from_pretrained("scottykwok/wav2vec2-large-xlsr-cantonese")
|
||||
model = Wav2Vec2ForCTC.from_pretrained("scottykwok/wav2vec2-large-xlsr-cantonese")
|
||||
|
||||
# load audio - must be 16kHz mono
|
||||
audio_input, sample_rate = sf.read('audio.wav')
|
||||
|
||||
# pad input values and return pt tensor
|
||||
input_values = processor(audio_input, sampling_rate=sample_rate, return_tensors="pt").input_values
|
||||
|
||||
# INFERENCE
|
||||
# retrieve logits & take argmax
|
||||
logits = model(input_values).logits
|
||||
predicted_ids = torch.argmax(logits, dim=-1)
|
||||
|
||||
# transcribe
|
||||
transcription = processor.decode(predicted_ids[0])
|
||||
print("-" *20)
|
||||
print("Transcription:\n", transcription.lower())
|
||||
print("-" *20)
|
||||
|
||||
```
|
||||
Reference in New Issue
Block a user