ModelHub XC 4f9314551e 初始化项目,由ModelHub XC社区提供模型
Model: kresnik/wav2vec2-large-xlsr-korean
Source: Original Platform
2026-05-22 02:24:16 +08:00

language, datasets, tags, license, model-index
language datasets tags license model-index
ko
kresnik/zeroth_korean
speech
audio
automatic-speech-recognition
apache-2.0
name results
Wav2Vec2 XLSR Korean
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type args
Zeroth Korean kresnik/zeroth_korean clean
name type value
Test WER wer 4.74
name type value
Test CER cer 1.78

Evaluation on Zeroth-Korean ASR corpus

Google colab notebook(Korean)

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
from datasets import load_dataset
import soundfile as sf
import torch
from jiwer import wer

processor = Wav2Vec2Processor.from_pretrained("kresnik/wav2vec2-large-xlsr-korean")

model = Wav2Vec2ForCTC.from_pretrained("kresnik/wav2vec2-large-xlsr-korean").to('cuda')

ds = load_dataset("kresnik/zeroth_korean", "clean")

test_ds = ds['test']

def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch

test_ds = test_ds.map(map_to_array)

def map_to_pred(batch):
    inputs = processor(batch["speech"], sampling_rate=16000, return_tensors="pt", padding="longest")
    input_values = inputs.input_values.to("cuda")
    
    with torch.no_grad():
        logits = model(input_values).logits

    predicted_ids = torch.argmax(logits, dim=-1)
    transcription = processor.batch_decode(predicted_ids)
    batch["transcription"] = transcription
    return batch

result = test_ds.map(map_to_pred, batched=True, batch_size=16, remove_columns=["speech"])

print("WER:", wer(result["text"], result["transcription"]))

Expected WER: 4.74%

Expected CER: 1.78%

Description
Model synced from source: kresnik/wav2vec2-large-xlsr-korean
Readme 113 KiB