初始化项目,由ModelHub XC社区提供模型
Model: dvislobokov/whisper-large-v3-turbo-russian Source: Original Platform
This commit is contained in:
48
README.md
Normal file
48
README.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
license: mit
|
||||
datasets:
|
||||
- mozilla-foundation/common_voice_17_0
|
||||
language:
|
||||
- ru
|
||||
base_model:
|
||||
- openai/whisper-large-v3-turbo
|
||||
pipeline_tag: automatic-speech-recognition
|
||||
metrics:
|
||||
- accuracy
|
||||
library_name: transformers
|
||||
tags:
|
||||
- call
|
||||
---
|
||||
|
||||
### This model whas trained with two A100 40 GB, 128 GB RAM and 2 x Xeon 48 Core 2.4 GHz
|
||||
- Time spent ~ 7 hours
|
||||
- Count of train dataset - 118k of audio samples from Mozilla Common Voice 17
|
||||
---
|
||||
Example of usage
|
||||
```python
|
||||
from transformers import pipeline
|
||||
import gradio as gr
|
||||
import time
|
||||
|
||||
pipe = pipeline(
|
||||
model="dvislobokov/whisper-large-v3-turbo-russian",
|
||||
tokenizer="dvislobokov/whisper-large-v3-turbo-russian",
|
||||
task='automatic-speech-recognition',
|
||||
device='cpu'
|
||||
)
|
||||
|
||||
def transcribe(audio):
|
||||
start = time.time()
|
||||
text = pipe(audio, return_timestamps=True)['text']
|
||||
print(time.time() - start)
|
||||
return text
|
||||
|
||||
iface = gr.Interface(
|
||||
fn=transcribe,
|
||||
inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'),
|
||||
outputs='text'
|
||||
)
|
||||
|
||||
iface.launch(share=True)
|
||||
|
||||
```
|
||||
Reference in New Issue
Block a user