--- language: lo license: apache-2.0 tags: - automatic-speech-recognition - speech - audio - lao - wav2vec2 - xlsr datasets: - SiangLao/lao-asr-thesis-dataset metrics: - cer base_model: - facebook/wav2vec2-large-xlsr-53 library_name: transformers --- # XLSR-53 Lao ASR Fine-tuned XLSR-53 model for Lao automatic speech recognition, achieving 16.22% CER on test data. ## Model Details This model is fine-tuned from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using the SiangLao/lao-asr-thesis-dataset. ### Training Configuration - **Epochs**: 15 - **Batch Size**: 16 - **Learning Rate**: 1e-4 - **Training Date**: June 3, 2025 - **Vocabulary Size**: 55 Lao characters + special tokens ### Performance | Split | CER | Loss | |-------|-----|------| | Test | 16.22% | 0.419 | | Validation | 16.52% | 0.487 | ## Usage ```python from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor import torch import librosa # Load model and processor model = Wav2Vec2ForCTC.from_pretrained("SiangLao/xlsr-53-lao-asr") processor = Wav2Vec2Processor.from_pretrained("SiangLao/xlsr-53-lao-asr") # Load audio (must be 16kHz) audio, sr = librosa.load("audio.wav", sr=16000) # Process audio inputs = processor(audio, sampling_rate=16000, return_tensors="pt") # Generate prediction with torch.no_grad(): logits = model(**inputs).logits predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.batch_decode(predicted_ids)[0] # Clean transcription transcription = transcription.replace("", " ").strip() print(transcription) ``` ## Citation ```bibtex @thesis{naovalath2025lao, title={Lao Automatic Speech Recognition using Transfer Learning}, author={Souphaxay Naovalath and Sounmy Chanthavong}, advisor={Dr. Somsack Inthasone}, school={National University of Laos, Faculty of Natural Sciences, Computer Science Department}, year={2025} } ```