xlsr-53-lao-asr/README.md

---
language: lo
license: apache-2.0
tags:
- automatic-speech-recognition
- speech
- audio
- lao
- wav2vec2
- xlsr
datasets:
- SiangLao/lao-asr-thesis-dataset
metrics:
- cer
base_model:
- facebook/wav2vec2-large-xlsr-53
library_name: transformers
---

# XLSR-53 Lao ASR

Fine-tuned XLSR-53 model for Lao automatic speech recognition, achieving 16.22% CER on test data.

## Model Details

This model is fine-tuned from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using the SiangLao/lao-asr-thesis-dataset.

### Training Configuration
- **Epochs**: 15
- **Batch Size**: 16  
- **Learning Rate**: 1e-4
- **Training Date**: June 3, 2025
- **Vocabulary Size**: 55 Lao characters + special tokens

### Performance

| Split | CER | Loss |
|-------|-----|------|
| Test | 16.22% | 0.419 |
| Validation | 16.52% | 0.487 |

## Usage

```python
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import torch
import librosa

# Load model and processor
model = Wav2Vec2ForCTC.from_pretrained("SiangLao/xlsr-53-lao-asr")
processor = Wav2Vec2Processor.from_pretrained("SiangLao/xlsr-53-lao-asr")

# Load audio (must be 16kHz)
audio, sr = librosa.load("audio.wav", sr=16000)

# Process audio
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

# Generate prediction
with torch.no_grad():
    logits = model(**inputs).logits
    predicted_ids = torch.argmax(logits, dim=-1)
    transcription = processor.batch_decode(predicted_ids)[0]

# Clean transcription
transcription = transcription.replace("<unk>", " ").strip()

print(transcription)
```

## Citation
```bibtex
@thesis{naovalath2025lao,
  title={Lao Automatic Speech Recognition using Transfer Learning},
  author={Souphaxay Naovalath and Sounmy Chanthavong},
  advisor={Dr. Somsack Inthasone},
  school={National University of Laos, Faculty of Natural Sciences, Computer Science Department},
  year={2025}
}
```
初始化项目，由ModelHub XC社区提供模型 Model: SiangLao/xlsr-53-lao-asr Source: Original Platform 2026-05-08 11:40:38 +08:00			`---`
			`language: lo`
			`license: apache-2.0`
			`tags:`
			`- automatic-speech-recognition`
			`- speech`
			`- audio`
			`- lao`
			`- wav2vec2`
			`- xlsr`
			`datasets:`
			`- SiangLao/lao-asr-thesis-dataset`
			`metrics:`
			`- cer`
			`base_model:`
			`- facebook/wav2vec2-large-xlsr-53`
			`library_name: transformers`
			`---`

			`# XLSR-53 Lao ASR`

			`Fine-tuned XLSR-53 model for Lao automatic speech recognition, achieving 16.22% CER on test data.`

			`## Model Details`

			`This model is fine-tuned from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using the SiangLao/lao-asr-thesis-dataset.`

			`### Training Configuration`
			`- Epochs: 15`
			`- Batch Size: 16`
			`- Learning Rate: 1e-4`
			`- Training Date: June 3, 2025`
			`- Vocabulary Size: 55 Lao characters + special tokens`

			`### Performance`

			`\| Split \| CER \| Loss \|`
			`\|-------\|-----\|------\|`
			`\| Test \| 16.22% \| 0.419 \|`
			`\| Validation \| 16.52% \| 0.487 \|`

			`## Usage`

			```python
			`from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor`
			`import torch`
			`import librosa`

			`# Load model and processor`
			`model = Wav2Vec2ForCTC.from_pretrained("SiangLao/xlsr-53-lao-asr")`
			`processor = Wav2Vec2Processor.from_pretrained("SiangLao/xlsr-53-lao-asr")`

			`# Load audio (must be 16kHz)`
			`audio, sr = librosa.load("audio.wav", sr=16000)`

			`# Process audio`
			`inputs = processor(audio, sampling_rate=16000, return_tensors="pt")`

			`# Generate prediction`
			`with torch.no_grad():`
			`logits = model(**inputs).logits`
			`predicted_ids = torch.argmax(logits, dim=-1)`
			`transcription = processor.batch_decode(predicted_ids)[0]`

			`# Clean transcription`
			`transcription = transcription.replace("<unk>", " ").strip()`

			`print(transcription)`
			```

			`## Citation`
			```bibtex
			`@thesis{naovalath2025lao,`
			`title={Lao Automatic Speech Recognition using Transfer Learning},`
			`author={Souphaxay Naovalath and Sounmy Chanthavong},`
			`advisor={Dr. Somsack Inthasone},`
			`school={National University of Laos, Faculty of Natural Sciences, Computer Science Department},`
			`year={2025}`
			`}`
			```