Add audio tagging APIs for node-addon-api (#875)
This commit is contained in:
@@ -27,7 +27,82 @@ export LD_LIBRARY_PATH=$PWD/node_modules/sherpa-onnx-linux-x64:$LD_LIBRARY_PATH
|
||||
export LD_LIBRARY_PATH=$PWD/node_modules/sherpa-onnx-linux-arm64:$LD_LIBRARY_PATH
|
||||
```
|
||||
|
||||
# Voice Activity detection (VAD)
|
||||
# Examples
|
||||
|
||||
The following tables list the examples in this folder.
|
||||
|
||||
## Voice activity detection (VAD)
|
||||
|
||||
|File| Description|
|
||||
|---|---|
|
||||
|[./test_vad_microphone.js](./test_vad_microphone.js)| VAD with a microphone. It uses [silero-vad](https://github.com/snakers4/silero-vad)|
|
||||
|
||||
## Speaker identification
|
||||
|
||||
|File| Description|
|
||||
|---|---|
|
||||
|[ ./test_speaker_identification.js]( ./test_speaker_identification.js)| Speaker identification from a file|
|
||||
|
||||
## Spoken language identification
|
||||
|
||||
|File| Description|
|
||||
|---|---|
|
||||
|[./test_vad_spoken_language_identification_microphone.js](./test_vad_spoken_language_identification_microphone.js)|Spoken language identification from a microphone using a multi-lingual [Whisper](https://github.com/openai/whisper) model|
|
||||
|
||||
## Audio tagging
|
||||
|
||||
|File| Description|
|
||||
|---|---|
|
||||
|[./test_audio_tagging_zipformer.js](./test_audio_tagging_zipformer.js)| Audio tagging with a Zipformer model|
|
||||
|[./test_audio_tagging_ced.js](./test_audio_tagging_ced.js)| Audio tagging with a [CED](https://github.com/RicherMans/CED) model|
|
||||
|
||||
## Streaming speech-to-text from files
|
||||
|
||||
|File| Description|
|
||||
|---|---|
|
||||
|[./test_asr_streaming_transducer.js](./test_asr_streaming_transducer.js)| Streaming speech recognition from a file using a Zipformer transducer model|
|
||||
|[./test_asr_streaming_ctc.js](./test_asr_streaming_ctc.js)| Streaming speech recognition from a file using a Zipformer CTC model with greedy search|
|
||||
|[./test_asr_streaming_ctc_hlg.js](./test_asr_streaming_ctc_hlg.js)| Streaming speech recognition from a file using a Zipformer CTC model with HLG decoding|
|
||||
|[./test_asr_streaming_paraformer.js](./test_asr_streaming_paraformer.js)|Streaming speech recognition from a file using a [Paraformer](https://github.com/alibaba-damo-academy/FunASR) model|
|
||||
|
||||
## Streaming speech-to-text from a microphone
|
||||
|
||||
|File| Description|
|
||||
|---|---|
|
||||
|[./test_asr_streaming_transducer_microphone.js](./test_asr_streaming_transducer_microphone.js)| Streaming speech recognition from a microphone using a Zipformer transducer model|
|
||||
|[./test_asr_streaming_ctc_microphone.js](./test_asr_streaming_ctc_microphone.js)| Streaming speech recognition from a microphone using a Zipformer CTC model with greedy search|
|
||||
|[./test_asr_streaming_ctc_hlg_microphone.js](./test_asr_streaming_ctc_hlg_microphone.js)|Streaming speech recognition from a microphone using a Zipformer CTC model with HLG decoding|
|
||||
|[./test_asr_streaming_paraformer_microphone.js](./test_asr_streaming_paraformer_microphone.js)| Streaming speech recognition from a microphone using a [Paraformer](https://github.com/alibaba-damo-academy/FunASR) model|
|
||||
|
||||
## Non-Streaming speech-to-text from files
|
||||
|
||||
|File| Description|
|
||||
|---|---|
|
||||
|[./test_asr_non_streaming_transducer.js](./test_asr_non_streaming_transducer.js)|Non-streaming speech recognition from a file with a Zipformer transducer model|
|
||||
|[./test_asr_non_streaming_whisper.js](./test_asr_non_streaming_whisper.js)| Non-streaming speech recognition from a file using [Whisper](https://github.com/openai/whisper)|
|
||||
|[./test_asr_non_streaming_nemo_ctc.js](./test_asr_non_streaming_nemo_ctc.js)|Non-streaming speech recognition from a file using a [NeMo](https://github.com/NVIDIA/NeMo) CTC model with greedy search|
|
||||
|[./test_asr_non_streaming_paraformer.js](./test_asr_non_streaming_paraformer.js)|Non-streaming speech recognition from a file using [Paraformer](https://github.com/alibaba-damo-academy/FunASR)|
|
||||
|
||||
## Non-Streaming speech-to-text from a microphone with VAD
|
||||
|
||||
|File| Description|
|
||||
|---|---|
|
||||
|[./test_vad_asr_non_streaming_transducer_microphone.js](./test_vad_asr_non_streaming_transducer_microphone.js)|VAD + Non-streaming speech recognition from a microphone using a Zipformer transducer model|
|
||||
|[./test_vad_asr_non_streaming_whisper_microphone.js](./test_vad_asr_non_streaming_whisper_microphone.js)|VAD + Non-streaming speech recognition from a microphone using [Whisper](https://github.com/openai/whisper)|
|
||||
|[./test_vad_asr_non_streaming_nemo_ctc_microphone.js](./test_vad_asr_non_streaming_nemo_ctc_microphone.js)|VAD + Non-streaming speech recognition from a microphone using a [NeMo](https://github.com/NVIDIA/NeMo) CTC model with greedy search|
|
||||
|[./test_vad_asr_non_streaming_paraformer_microphone.js](./test_vad_asr_non_streaming_paraformer_microphone.js)|VAD + Non-streaming speech recognition from a microphone using [Paraformer](https://github.com/alibaba-damo-academy/FunASR)|
|
||||
|
||||
## Text-to-speech
|
||||
|
||||
|File| Description|
|
||||
|---|---|
|
||||
|[./test_tts_non_streaming_vits_piper_en.js](./test_tts_non_streaming_vits_piper_en.js)| Text-to-speech with a [piper](https://github.com/rhasspy/piper) English model|
|
||||
|[./test_tts_non_streaming_vits_coqui_de.js](./test_tts_non_streaming_vits_coqui_de.js)| Text-to-speech with a [coqui](https://github.com/coqui-ai/TTS) German model|
|
||||
|[./test_tts_non_streaming_vits_zh_ll.js](./test_tts_non_streaming_vits_zh_ll.js)| Text-to-speech with a Chinese model using [cppjieba](https://github.com/yanyiwu/cppjieba)|
|
||||
|[./test_tts_non_streaming_vits_zh_aishell3.js](./test_tts_non_streaming_vits_zh_aishell3.js)| Text-to-speech with a Chinese TTS model|
|
||||
|
||||
|
||||
### Voice Activity detection (VAD)
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
|
||||
@@ -39,7 +114,27 @@ npm install naudiodon2
|
||||
node ./test_vad_microphone.js
|
||||
```
|
||||
|
||||
## Streaming speech recognition with Zipformer transducer
|
||||
### Audio tagging with zipformer
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
|
||||
tar xvf sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
|
||||
rm sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
|
||||
|
||||
node ./test_audio_tagging_zipformer.js
|
||||
```
|
||||
|
||||
### Audio tagging with CED
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
|
||||
tar xvf sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
|
||||
rm sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
|
||||
|
||||
node ./test_audio_tagging_ced.js
|
||||
```
|
||||
|
||||
### Streaming speech recognition with Zipformer transducer
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
|
||||
@@ -54,7 +149,7 @@ npm install naudiodon2
|
||||
node ./test_asr_streaming_transducer_microphone.js
|
||||
```
|
||||
|
||||
## Streaming speech recognition with Zipformer CTC
|
||||
### Streaming speech recognition with Zipformer CTC
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18.tar.bz2
|
||||
@@ -73,7 +168,7 @@ node ./test_asr_streaming_ctc_microphone.js
|
||||
node ./test_asr_streaming_ctc_hlg_microphone.js
|
||||
```
|
||||
|
||||
## Streaming speech recognition with Paraformer
|
||||
### Streaming speech recognition with Paraformer
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
|
||||
@@ -88,7 +183,7 @@ npm install naudiodon2
|
||||
node ./test_asr_streaming_paraformer_microphone.js
|
||||
```
|
||||
|
||||
## Non-streaming speech recognition with Zipformer transducer
|
||||
### Non-streaming speech recognition with Zipformer transducer
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-en-2023-04-01.tar.bz2
|
||||
@@ -102,7 +197,7 @@ npm install naudiodon2
|
||||
node ./test_vad_asr_non_streaming_transducer_microphone.js
|
||||
```
|
||||
|
||||
## Non-streaming speech recognition with Whisper
|
||||
### Non-streaming speech recognition with Whisper
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
|
||||
@@ -116,7 +211,7 @@ npm install naudiodon2
|
||||
node ./test_vad_asr_non_streaming_whisper_microphone.js
|
||||
```
|
||||
|
||||
## Non-streaming speech recognition with NeMo CTC models
|
||||
### Non-streaming speech recognition with NeMo CTC models
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-fast-conformer-ctc-be-de-en-es-fr-hr-it-pl-ru-uk-20k.tar.bz2
|
||||
@@ -130,7 +225,7 @@ npm install naudiodon2
|
||||
node ./test_vad_asr_non_streaming_nemo_ctc_microphone.js
|
||||
```
|
||||
|
||||
## Non-streaming speech recognition with Paraformer
|
||||
### Non-streaming speech recognition with Paraformer
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
|
||||
@@ -144,7 +239,7 @@ npm install naudiodon2
|
||||
node ./test_vad_asr_non_streaming_paraformer_microphone.js
|
||||
```
|
||||
|
||||
## Text-to-speech with piper VITS models (TTS)
|
||||
### Text-to-speech with piper VITS models (TTS)
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_GB-cori-medium.tar.bz2
|
||||
@@ -154,7 +249,7 @@ rm vits-piper-en_GB-cori-medium.tar.bz2
|
||||
node ./test_tts_non_streaming_vits_piper_en.js
|
||||
```
|
||||
|
||||
## Text-to-speech with piper Coqui-ai/TTS models (TTS)
|
||||
### Text-to-speech with piper Coqui-ai/TTS models (TTS)
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-coqui-de-css10.tar.bz2
|
||||
@@ -164,7 +259,7 @@ rm vits-coqui-de-css10.tar.bz2
|
||||
node ./test_tts_non_streaming_vits_coqui_de.js
|
||||
```
|
||||
|
||||
## Text-to-speech with vits Chinese models (1/2)
|
||||
### Text-to-speech with vits Chinese models (1/2)
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-vits-zh-ll.tar.bz2
|
||||
@@ -174,7 +269,7 @@ rm sherpa-onnx-vits-zh-ll.tar.bz2
|
||||
node ./test_tts_non_streaming_vits_zh_ll.js
|
||||
```
|
||||
|
||||
## Text-to-speech with vits Chinese models (2/2)
|
||||
### Text-to-speech with vits Chinese models (2/2)
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-icefall-zh-aishell3.tar.bz2
|
||||
@@ -184,7 +279,7 @@ rm vits-icefall-zh-aishell3.tar.bz2
|
||||
node ./test_tts_non_streaming_vits_zh_aishell3.js
|
||||
```
|
||||
|
||||
## Spoken language identification with Whisper multi-lingual models
|
||||
### Spoken language identification with Whisper multi-lingual models
|
||||
|
||||
```bash
|
||||
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.tar.bz2
|
||||
@@ -202,7 +297,7 @@ npm install naudiodon2
|
||||
node ./test_vad_spoken_language_identification_microphone.js
|
||||
```
|
||||
|
||||
## Speaker identification
|
||||
### Speaker identification
|
||||
|
||||
You can find more models at
|
||||
<https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models>
|
||||
|
||||
Reference in New Issue
Block a user