Files
xlsr-53-lao-asr/README.md

80 lines
1.9 KiB
Markdown
Raw Permalink Normal View History

---
language: lo
license: apache-2.0
tags:
- automatic-speech-recognition
- speech
- audio
- lao
- wav2vec2
- xlsr
datasets:
- SiangLao/lao-asr-thesis-dataset
metrics:
- cer
base_model:
- facebook/wav2vec2-large-xlsr-53
library_name: transformers
---
# XLSR-53 Lao ASR
Fine-tuned XLSR-53 model for Lao automatic speech recognition, achieving 16.22% CER on test data.
## Model Details
This model is fine-tuned from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using the SiangLao/lao-asr-thesis-dataset.
### Training Configuration
- **Epochs**: 15
- **Batch Size**: 16
- **Learning Rate**: 1e-4
- **Training Date**: June 3, 2025
- **Vocabulary Size**: 55 Lao characters + special tokens
### Performance
| Split | CER | Loss |
|-------|-----|------|
| Test | 16.22% | 0.419 |
| Validation | 16.52% | 0.487 |
## Usage
```python
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import torch
import librosa
# Load model and processor
model = Wav2Vec2ForCTC.from_pretrained("SiangLao/xlsr-53-lao-asr")
processor = Wav2Vec2Processor.from_pretrained("SiangLao/xlsr-53-lao-asr")
# Load audio (must be 16kHz)
audio, sr = librosa.load("audio.wav", sr=16000)
# Process audio
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Generate prediction
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
# Clean transcription
transcription = transcription.replace("<unk>", " ").strip()
print(transcription)
```
## Citation
```bibtex
@thesis{naovalath2025lao,
title={Lao Automatic Speech Recognition using Transfer Learning},
author={Souphaxay Naovalath and Sounmy Chanthavong},
advisor={Dr. Somsack Inthasone},
school={National University of Laos, Faculty of Natural Sciences, Computer Science Department},
year={2025}
}
```