GetmanY1/wav2vec2-large-fi-150k-finetuned

Go to file

ModelHub XC 8181966075 初始化项目，由ModelHub XC社区提供模型

Model: GetmanY1/wav2vec2-large-fi-150k-finetuned
Source: Original Platform

2026-05-12 22:56:36 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-12 22:56:36 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 22:56:36 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-12 22:56:36 +08:00

preprocessor_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 22:56:36 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-12 22:56:36 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 22:56:36 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 22:56:36 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 22:56:36 +08:00

README.md

license, tags, library_name, language, base_model, model-index

license

tags

library_name

language

base_model

model-index

apache-2.0

automatic-speech-recognition

finnish

transformers

GetmanY1/wav2vec2-large-fi-150k

name

results

wav2vec2-large-fi-150k-finetuned

task

dataset

metrics

name	type
Automatic Speech Recognition	automatic-speech-recognition

name	type	args
Lahjoita puhetta (Donate Speech)	lahjoita-puhetta	fi

name	type	value
Dev WER	wer	15.34

name	type	value
Dev CER	cer	4.14

name	type	value
Test WER	wer	16.86

name	type	value
Test CER	cer	5.07

task

dataset

metrics

name	type
Automatic Speech Recognition	automatic-speech-recognition

name	type	args
Finnish Parliament	FinParl	fi

name	type	value
Dev16 WER	wer	11.3

name	type	value
Dev16 CER	cer	4.75

name	type	value
Test16 WER	wer	8.29

name	type	value
Test16 CER	cer	3.34

name	type	value
Test20 WER	wer	6.94

name	type	value
Test20 CER	cer	2.15

task

dataset

metrics

name	type
Automatic Speech Recognition	automatic-speech-recognition

name	type	args
Common Voice 16.1	mozilla-foundation/common_voice_16_1	fi

name	type	value
Dev WER	wer	7.17

name	type	value
Dev CER	cer	1.11

name	type	value
Test WER	wer	5.86

name	type	value
Test CER	cer	0.91

task

dataset

metrics

name	type
Automatic Speech Recognition	automatic-speech-recognition

name	type	args
FLEURS	google/fleurs	fi_fi

name	type	value
Dev WER	wer	9.2

name	type	value
Dev CER	cer	5.23

name	type	value
Test WER	wer	10.69

name	type	value
Test CER	cer	5.79

Finnish Wav2vec2-Large ASR

GetmanY1/wav2vec2-large-fi-150k fine-tuned on 4600 hours of Finnish speech on 16kHz sampled speech audio:

1500 hours of Lahjoita puhetta (Donate Speech) (colloquial Finnish)
3100 hours of the Finnish Parliament dataset

When using the model make sure that your speech input is also sampled at 16Khz.

Model description

The Finnish Wav2Vec2 Large has the same architecture and uses the same training objective as the English and multilingual one described in Paper.

GetmanY1/wav2vec2-large-fi-150k is a large-scale, 317-million parameter monolingual model pre-trained on 158k hours of unlabeled Finnish speech, including KAVI radio and television archive materials, Lahjoita puhetta (Donate Speech), Finnish Parliament, Finnish VoxPopuli.

You can read more about the pre-trained model from this paper. The training scripts are available on GitHub.

Intended uses

You can use this model for Finnish ASR (speech-to-text).

How to use

To transcribe audio files the model can be used as a standalone acoustic model as follows:

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import torch

# load model and processor
processor = Wav2Vec2Processor.from_pretrained("GetmanY1/wav2vec2-large-fi-150k-finetuned")
model = Wav2Vec2ForCTC.from_pretrained("GetmanY1/wav2vec2-large-fi-150k-finetuned")

# load dummy dataset and read soundfiles
ds = load_dataset("mozilla-foundation/common_voice_16_1", "fi", split='test')

# tokenize
input_values = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest").input_values  # Batch size 1

# retrieve logits
logits = model(input_values).logits

# take argmax and decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

Citation

If you use our models or scripts, please cite our article as:

@inproceedings{getman25_interspeech,
  title     = {{Is your model big enough? Training and interpreting large-scale monolingual speech foundation models}},
  author    = {{Yaroslav Getman and Tamás Grósz and Tommi Lehtonen and Mikko Kurimo}},
  year      = {{2025}},
  booktitle = {{Interspeech 2025}},
  pages     = {{231--235}},
  doi       = {{10.21437/Interspeech.2025-46}},
  issn      = {{2958-1796}},
}

Team Members

Yaroslav Getman, Hugging Face profile, LinkedIn profile
Tamas Grosz, Hugging Face profile, LinkedIn profile

Feel free to contact us for more details 🤗