初始化项目，由ModelHub XC社区提供模型

Model: bond005/whisper-large-v3-ru-podlodka Source: Original Platform
2026-05-12 12:23:54 +08:00
commit 59ae840aff
20 changed files with 233001 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,82 @@
+---
+datasets:
+- bond005/taiga_speech_v2
+- bond005/podlodka_speech
+- bond005/rulibrispeech
+language:
+- ru
+license: apache-2.0
+metrics:
+- wer
+pipeline_tag: automatic-speech-recognition
+library_name: transformers
+widget:
+- example_title: Нейронные сети - это хорошо!
+  src: https://huggingface.co/bond005/whisper-large-v3-ru-podlodka/resolve/main/test_sound_ru.flac
+- example_title: К сожалению, система распознавания речи не всегда стабильна, особенно
+    в шумных условиях.
+  src: https://huggingface.co/bond005/whisper-large-v3-ru-podlodka/resolve/main/test_sound_with_noise.wav
+- example_title: Мимо театра мальчик ходил довольно часто — белое, со взбитыми сливками,
+    здание-торт.
+  src: https://huggingface.co/bond005/whisper-large-v3-ru-podlodka/resolve/main/anna_matveeva_test.wav
+model-index:
+- name: Whisper Large V3 Russian Podlodka by Ivan Bondarenko
+  results:
+  - task:
+      type: automatic-speech-recognition
+      name: Speech Recognition
+    dataset:
+      name: Podlodka.io
+      type: bond005/podlodka_speech
+      args: ru
+    metrics:
+    - type: wer
+      value: 20.91
+      name: WER (with punctuation and capital letters)
+    - type: wer
+      value: 10.987
+      name: WER (without punctuation)
+  - task:
+      type: automatic-speech-recognition
+      name: Speech Recognition
+    dataset:
+      name: Russian Librispeech
+      type: bond005/rulibrispeech
+      args: ru
+    metrics:
+    - type: wer
+      value: 9.795
+      name: WER (without punctuation)
+---
+
+# Whisper Large V3 Russian Podlodka
+
+This repository contains a fine-tuned Whisper Large V3 model for Russian speech recognition. It serves as the core transcription component of the **Pisets** system, specifically optimized for long audio recordings such as lectures and interviews.
+
+The model was presented in the paper [Pisets: A Robust Speech Recognition System for Lectures and Interviews](https://huggingface.co/papers/2601.18415).
+
+## System Architecture
+
+The Pisets system implements a three-component architecture to improve recognition accuracy while minimizing hallucinations:
+1. **Wav2Vec2**: For primary recognition and segmentation.
+2. **Audio Spectrogram Transformer (AST)**: For filtering non-speech segments.
+3. **Whisper (this model)**: For the final high-quality transcription.
+
+## Implementation
+
+The complete source code and instructions for using the system (including generation of SRT and DocX files) can be found in the GitHub repository:
+
+**GitHub:** [https://github.com/bond005/pisets](https://github.com/bond005/pisets)
+
+## Citation
+
+If you use this model or the Pisets system in your research, please cite:
+
+```bibtex
+@article{bondarenko2026pisets,
+  title={Pisets: A Robust Speech Recognition System for Lectures and Interviews},
+  author={Ivan Bondarenko},
+  journal={arXiv preprint arXiv:2601.18415},
+  year={2026}
+}
+```