80 lines
1.9 KiB
Markdown
80 lines
1.9 KiB
Markdown
|
|
---
|
||
|
|
language: lo
|
||
|
|
license: apache-2.0
|
||
|
|
tags:
|
||
|
|
- automatic-speech-recognition
|
||
|
|
- speech
|
||
|
|
- audio
|
||
|
|
- lao
|
||
|
|
- wav2vec2
|
||
|
|
- xlsr
|
||
|
|
datasets:
|
||
|
|
- SiangLao/lao-asr-thesis-dataset
|
||
|
|
metrics:
|
||
|
|
- cer
|
||
|
|
base_model:
|
||
|
|
- facebook/wav2vec2-large-xlsr-53
|
||
|
|
library_name: transformers
|
||
|
|
---
|
||
|
|
|
||
|
|
# XLSR-53 Lao ASR
|
||
|
|
|
||
|
|
Fine-tuned XLSR-53 model for Lao automatic speech recognition, achieving 16.22% CER on test data.
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
|
||
|
|
This model is fine-tuned from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using the SiangLao/lao-asr-thesis-dataset.
|
||
|
|
|
||
|
|
### Training Configuration
|
||
|
|
- **Epochs**: 15
|
||
|
|
- **Batch Size**: 16
|
||
|
|
- **Learning Rate**: 1e-4
|
||
|
|
- **Training Date**: June 3, 2025
|
||
|
|
- **Vocabulary Size**: 55 Lao characters + special tokens
|
||
|
|
|
||
|
|
### Performance
|
||
|
|
|
||
|
|
| Split | CER | Loss |
|
||
|
|
|-------|-----|------|
|
||
|
|
| Test | 16.22% | 0.419 |
|
||
|
|
| Validation | 16.52% | 0.487 |
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
||
|
|
import torch
|
||
|
|
import librosa
|
||
|
|
|
||
|
|
# Load model and processor
|
||
|
|
model = Wav2Vec2ForCTC.from_pretrained("SiangLao/xlsr-53-lao-asr")
|
||
|
|
processor = Wav2Vec2Processor.from_pretrained("SiangLao/xlsr-53-lao-asr")
|
||
|
|
|
||
|
|
# Load audio (must be 16kHz)
|
||
|
|
audio, sr = librosa.load("audio.wav", sr=16000)
|
||
|
|
|
||
|
|
# Process audio
|
||
|
|
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
|
||
|
|
|
||
|
|
# Generate prediction
|
||
|
|
with torch.no_grad():
|
||
|
|
logits = model(**inputs).logits
|
||
|
|
predicted_ids = torch.argmax(logits, dim=-1)
|
||
|
|
transcription = processor.batch_decode(predicted_ids)[0]
|
||
|
|
|
||
|
|
# Clean transcription
|
||
|
|
transcription = transcription.replace("<unk>", " ").strip()
|
||
|
|
|
||
|
|
print(transcription)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
```bibtex
|
||
|
|
@thesis{naovalath2025lao,
|
||
|
|
title={Lao Automatic Speech Recognition using Transfer Learning},
|
||
|
|
author={Souphaxay Naovalath and Sounmy Chanthavong},
|
||
|
|
advisor={Dr. Somsack Inthasone},
|
||
|
|
school={National University of Laos, Faculty of Natural Sciences, Computer Science Department},
|
||
|
|
year={2025}
|
||
|
|
}
|
||
|
|
```
|