To use on a local audio file with the language model
importtorchimporttorchaudiofromtransformersimportAutoModelForCTC,Wav2Vec2ProcessorWithLMdevice=torch.device("cuda:0"iftorch.cuda.is_available()else"cpu")model=AutoModelForCTC.from_pretrained("bhuang/asr-wav2vec2-french").to(device)processor_with_lm=Wav2Vec2ProcessorWithLM.from_pretrained("bhuang/asr-wav2vec2-french")model_sample_rate=processor_with_lm.feature_extractor.sampling_ratewav_path="example.wav"# path to your audio filewaveform,sample_rate=torchaudio.load(wav_path)waveform=waveform.squeeze(axis=0)# mono# resampleifsample_rate!=model_sample_rate:resampler=torchaudio.transforms.Resample(sample_rate,model_sample_rate)waveform=resampler(waveform)# normalizeinput_dict=processor_with_lm(waveform,sampling_rate=model_sample_rate,return_tensors="pt")withtorch.inference_mode():logits=model(input_dict.input_values.to(device)).logitspredicted_sentence=processor_with_lm.batch_decode(logits.cpu().numpy()).text[0]
To use on a local audio file without the language model
importtorchimporttorchaudiofromtransformersimportAutoModelForCTC,Wav2Vec2Processordevice=torch.device("cuda:0"iftorch.cuda.is_available()else"cpu")model=AutoModelForCTC.from_pretrained("bhuang/asr-wav2vec2-french").to(device)processor=Wav2Vec2Processor.from_pretrained("bhuang/asr-wav2vec2-french")model_sample_rate=processor.feature_extractor.sampling_ratewav_path="example.wav"# path to your audio filewaveform,sample_rate=torchaudio.load(wav_path)waveform=waveform.squeeze(axis=0)# mono# resampleifsample_rate!=model_sample_rate:resampler=torchaudio.transforms.Resample(sample_rate,model_sample_rate)waveform=resampler(waveform)# normalizeinput_dict=processor(waveform,sampling_rate=model_sample_rate,return_tensors="pt")withtorch.inference_mode():logits=model(input_dict.input_values.to(device)).logits# decodepredicted_ids=torch.argmax(logits,dim=-1)predicted_sentence=processor.batch_decode(predicted_ids)[0]
Evaluation
To evaluate on mozilla-foundation/common_voice_11_0