48 lines
1.0 KiB
Markdown
48 lines
1.0 KiB
Markdown
|
|
---
|
||
|
|
license: mit
|
||
|
|
datasets:
|
||
|
|
- mozilla-foundation/common_voice_17_0
|
||
|
|
language:
|
||
|
|
- ru
|
||
|
|
base_model:
|
||
|
|
- openai/whisper-large-v3-turbo
|
||
|
|
pipeline_tag: automatic-speech-recognition
|
||
|
|
metrics:
|
||
|
|
- accuracy
|
||
|
|
library_name: transformers
|
||
|
|
tags:
|
||
|
|
- call
|
||
|
|
---
|
||
|
|
|
||
|
|
### This model whas trained with two A100 40 GB, 128 GB RAM and 2 x Xeon 48 Core 2.4 GHz
|
||
|
|
- Time spent ~ 7 hours
|
||
|
|
- Count of train dataset - 118k of audio samples from Mozilla Common Voice 17
|
||
|
|
---
|
||
|
|
Example of usage
|
||
|
|
```python
|
||
|
|
from transformers import pipeline
|
||
|
|
import gradio as gr
|
||
|
|
import time
|
||
|
|
|
||
|
|
pipe = pipeline(
|
||
|
|
model="dvislobokov/whisper-large-v3-turbo-russian",
|
||
|
|
tokenizer="dvislobokov/whisper-large-v3-turbo-russian",
|
||
|
|
task='automatic-speech-recognition',
|
||
|
|
device='cpu'
|
||
|
|
)
|
||
|
|
|
||
|
|
def transcribe(audio):
|
||
|
|
start = time.time()
|
||
|
|
text = pipe(audio, return_timestamps=True)['text']
|
||
|
|
print(time.time() - start)
|
||
|
|
return text
|
||
|
|
|
||
|
|
iface = gr.Interface(
|
||
|
|
fn=transcribe,
|
||
|
|
inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'),
|
||
|
|
outputs='text'
|
||
|
|
)
|
||
|
|
|
||
|
|
iface.launch(share=True)
|
||
|
|
|
||
|
|
```
|