--- datasets: - bond005/taiga_speech_v2 - bond005/podlodka_speech - bond005/rulibrispeech language: - ru license: apache-2.0 metrics: - wer pipeline_tag: automatic-speech-recognition library_name: transformers widget: - example_title: Нейронные сети - это хорошо! src: https://huggingface.co/bond005/whisper-large-v3-ru-podlodka/resolve/main/test_sound_ru.flac - example_title: К сожалению, система распознавания речи не всегда стабильна, особенно в шумных условиях. src: https://huggingface.co/bond005/whisper-large-v3-ru-podlodka/resolve/main/test_sound_with_noise.wav - example_title: Мимо театра мальчик ходил довольно часто — белое, со взбитыми сливками, здание-торт. src: https://huggingface.co/bond005/whisper-large-v3-ru-podlodka/resolve/main/anna_matveeva_test.wav model-index: - name: Whisper Large V3 Russian Podlodka by Ivan Bondarenko results: - task: type: automatic-speech-recognition name: Speech Recognition dataset: name: Podlodka.io type: bond005/podlodka_speech args: ru metrics: - type: wer value: 20.91 name: WER (with punctuation and capital letters) - type: wer value: 10.987 name: WER (without punctuation) - task: type: automatic-speech-recognition name: Speech Recognition dataset: name: Russian Librispeech type: bond005/rulibrispeech args: ru metrics: - type: wer value: 9.795 name: WER (without punctuation) --- # Whisper Large V3 Russian Podlodka This repository contains a fine-tuned Whisper Large V3 model for Russian speech recognition. It serves as the core transcription component of the **Pisets** system, specifically optimized for long audio recordings such as lectures and interviews. The model was presented in the paper [Pisets: A Robust Speech Recognition System for Lectures and Interviews](https://huggingface.co/papers/2601.18415). ## System Architecture The Pisets system implements a three-component architecture to improve recognition accuracy while minimizing hallucinations: 1. **Wav2Vec2**: For primary recognition and segmentation. 2. **Audio Spectrogram Transformer (AST)**: For filtering non-speech segments. 3. **Whisper (this model)**: For the final high-quality transcription. ## Implementation The complete source code and instructions for using the system (including generation of SRT and DocX files) can be found in the GitHub repository: **GitHub:** [https://github.com/bond005/pisets](https://github.com/bond005/pisets) ## Citation If you use this model or the Pisets system in your research, please cite: ```bibtex @article{bondarenko2026pisets, title={Pisets: A Robust Speech Recognition System for Lectures and Interviews}, author={Ivan Bondarenko}, journal={arXiv preprint arXiv:2601.18415}, year={2026} } ```