Add audio tagging APIs for node-addon-api (#875)

2024-05-14 17:32:30 +08:00
parent 388e6a98fc
commit d19f50b799
12 changed files with 520 additions and 16 deletions
--- a/nodejs-addon-examples/README.md
+++ b/nodejs-addon-examples/README.md
@@ -27,7 +27,82 @@ export LD_LIBRARY_PATH=$PWD/node_modules/sherpa-onnx-linux-x64:$LD_LIBRARY_PATH
 export LD_LIBRARY_PATH=$PWD/node_modules/sherpa-onnx-linux-arm64:$LD_LIBRARY_PATH
 ```

-# Voice Activity detection (VAD)
+# Examples
+
+The following tables list the examples in this folder.
+
+## Voice activity detection (VAD)
+
+|File| Description|
+|---|---|
+|[./test_vad_microphone.js](./test_vad_microphone.js)| VAD with a microphone. It uses [silero-vad](https://github.com/snakers4/silero-vad)|
+
+## Speaker identification
+
+|File| Description|
+|---|---|
+|[ ./test_speaker_identification.js]( ./test_speaker_identification.js)| Speaker identification from a file|
+
+## Spoken language identification
+
+|File| Description|
+|---|---|
+|[./test_vad_spoken_language_identification_microphone.js](./test_vad_spoken_language_identification_microphone.js)|Spoken language identification from a microphone using a multi-lingual [Whisper](https://github.com/openai/whisper) model|
+
+## Audio tagging
+
+|File| Description|
+|---|---|
+|[./test_audio_tagging_zipformer.js](./test_audio_tagging_zipformer.js)| Audio tagging with a Zipformer model|
+|[./test_audio_tagging_ced.js](./test_audio_tagging_ced.js)| Audio tagging with a [CED](https://github.com/RicherMans/CED) model|
+
+## Streaming speech-to-text from files
+
+|File| Description|
+|---|---|
+|[./test_asr_streaming_transducer.js](./test_asr_streaming_transducer.js)| Streaming speech recognition from a file using a Zipformer transducer model|
+|[./test_asr_streaming_ctc.js](./test_asr_streaming_ctc.js)| Streaming speech recognition from a file using a Zipformer CTC model with greedy search|
+|[./test_asr_streaming_ctc_hlg.js](./test_asr_streaming_ctc_hlg.js)| Streaming speech recognition from a file using a Zipformer CTC model with HLG decoding|
+|[./test_asr_streaming_paraformer.js](./test_asr_streaming_paraformer.js)|Streaming speech recognition from a file using a [Paraformer](https://github.com/alibaba-damo-academy/FunASR) model|
+
+## Streaming speech-to-text from a microphone
+
+|File| Description|
+|---|---|
+|[./test_asr_streaming_transducer_microphone.js](./test_asr_streaming_transducer_microphone.js)| Streaming speech recognition from a microphone using a Zipformer transducer model|
+|[./test_asr_streaming_ctc_microphone.js](./test_asr_streaming_ctc_microphone.js)| Streaming speech recognition from a microphone using a Zipformer CTC model with greedy search|
+|[./test_asr_streaming_ctc_hlg_microphone.js](./test_asr_streaming_ctc_hlg_microphone.js)|Streaming speech recognition from a microphone using a Zipformer CTC model with HLG decoding|
+|[./test_asr_streaming_paraformer_microphone.js](./test_asr_streaming_paraformer_microphone.js)| Streaming speech recognition from a microphone using a [Paraformer](https://github.com/alibaba-damo-academy/FunASR) model|
+
+## Non-Streaming speech-to-text from files
+
+|File| Description|
+|---|---|
+|[./test_asr_non_streaming_transducer.js](./test_asr_non_streaming_transducer.js)|Non-streaming speech recognition from a file with a Zipformer transducer model|
+|[./test_asr_non_streaming_whisper.js](./test_asr_non_streaming_whisper.js)| Non-streaming speech recognition from a file using [Whisper](https://github.com/openai/whisper)|
+|[./test_asr_non_streaming_nemo_ctc.js](./test_asr_non_streaming_nemo_ctc.js)|Non-streaming speech recognition from a file using a [NeMo](https://github.com/NVIDIA/NeMo) CTC model with greedy search|
+|[./test_asr_non_streaming_paraformer.js](./test_asr_non_streaming_paraformer.js)|Non-streaming speech recognition from a file using [Paraformer](https://github.com/alibaba-damo-academy/FunASR)|
+
+## Non-Streaming speech-to-text from a microphone with VAD
+
+|File| Description|
+|---|---|
+|[./test_vad_asr_non_streaming_transducer_microphone.js](./test_vad_asr_non_streaming_transducer_microphone.js)|VAD + Non-streaming speech recognition from a microphone using a Zipformer transducer model|
+|[./test_vad_asr_non_streaming_whisper_microphone.js](./test_vad_asr_non_streaming_whisper_microphone.js)|VAD + Non-streaming speech recognition from a microphone using [Whisper](https://github.com/openai/whisper)|
+|[./test_vad_asr_non_streaming_nemo_ctc_microphone.js](./test_vad_asr_non_streaming_nemo_ctc_microphone.js)|VAD + Non-streaming speech recognition from a microphone using a [NeMo](https://github.com/NVIDIA/NeMo) CTC model with greedy search|
+|[./test_vad_asr_non_streaming_paraformer_microphone.js](./test_vad_asr_non_streaming_paraformer_microphone.js)|VAD + Non-streaming speech recognition from a microphone using [Paraformer](https://github.com/alibaba-damo-academy/FunASR)|
+
+## Text-to-speech
+
+|File| Description|
+|---|---|
+|[./test_tts_non_streaming_vits_piper_en.js](./test_tts_non_streaming_vits_piper_en.js)| Text-to-speech with a [piper](https://github.com/rhasspy/piper) English model|
+|[./test_tts_non_streaming_vits_coqui_de.js](./test_tts_non_streaming_vits_coqui_de.js)| Text-to-speech with a [coqui](https://github.com/coqui-ai/TTS) German model|
+|[./test_tts_non_streaming_vits_zh_ll.js](./test_tts_non_streaming_vits_zh_ll.js)| Text-to-speech with a Chinese model using [cppjieba](https://github.com/yanyiwu/cppjieba)|
+|[./test_tts_non_streaming_vits_zh_aishell3.js](./test_tts_non_streaming_vits_zh_aishell3.js)| Text-to-speech with a Chinese TTS model|
+
+
+### Voice Activity detection (VAD)

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
@@ -39,7 +114,27 @@ npm install naudiodon2
 node ./test_vad_microphone.js
 ```

-## Streaming speech recognition with Zipformer transducer
+### Audio tagging with zipformer
+
+```bash
+wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
+tar xvf sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
+rm sherpa-onnx-zipformer-small-audio-tagging-2024-04-15.tar.bz2
+
+node ./test_audio_tagging_zipformer.js
+```
+
+### Audio tagging with CED
+
+```bash
+wget https://github.com/k2-fsa/sherpa-onnx/releases/download/audio-tagging-models/sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
+tar xvf sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
+rm sherpa-onnx-ced-mini-audio-tagging-2024-04-19.tar.bz2
+
+node ./test_audio_tagging_ced.js
+```
+
+### Streaming speech recognition with Zipformer transducer

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
@@ -54,7 +149,7 @@ npm install naudiodon2
 node ./test_asr_streaming_transducer_microphone.js
 ```

-## Streaming speech recognition with Zipformer CTC
+### Streaming speech recognition with Zipformer CTC

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-ctc-small-2024-03-18.tar.bz2
@@ -73,7 +168,7 @@ node ./test_asr_streaming_ctc_microphone.js
 node ./test_asr_streaming_ctc_hlg_microphone.js
 ```

-## Streaming speech recognition with Paraformer
+### Streaming speech recognition with Paraformer

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
@@ -88,7 +183,7 @@ npm install naudiodon2
 node ./test_asr_streaming_paraformer_microphone.js
 ```

-## Non-streaming speech recognition with Zipformer transducer
+### Non-streaming speech recognition with Zipformer transducer

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-zipformer-en-2023-04-01.tar.bz2
@@ -102,7 +197,7 @@ npm install naudiodon2
 node ./test_vad_asr_non_streaming_transducer_microphone.js
 ```

-## Non-streaming speech recognition with Whisper
+### Non-streaming speech recognition with Whisper

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
@@ -116,7 +211,7 @@ npm install naudiodon2
 node ./test_vad_asr_non_streaming_whisper_microphone.js
 ```

-## Non-streaming speech recognition with NeMo CTC models
+### Non-streaming speech recognition with NeMo CTC models

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-fast-conformer-ctc-be-de-en-es-fr-hr-it-pl-ru-uk-20k.tar.bz2
@@ -130,7 +225,7 @@ npm install naudiodon2
 node ./test_vad_asr_non_streaming_nemo_ctc_microphone.js
 ```

-## Non-streaming speech recognition with Paraformer
+### Non-streaming speech recognition with Paraformer

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
@@ -144,7 +239,7 @@ npm install naudiodon2
 node ./test_vad_asr_non_streaming_paraformer_microphone.js
 ```

-## Text-to-speech with piper VITS models (TTS)
+### Text-to-speech with piper VITS models (TTS)

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_GB-cori-medium.tar.bz2
@@ -154,7 +249,7 @@ rm vits-piper-en_GB-cori-medium.tar.bz2
 node ./test_tts_non_streaming_vits_piper_en.js
 ```

-## Text-to-speech with piper Coqui-ai/TTS models (TTS)
+### Text-to-speech with piper Coqui-ai/TTS models (TTS)

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-coqui-de-css10.tar.bz2
@@ -164,7 +259,7 @@ rm vits-coqui-de-css10.tar.bz2
 node ./test_tts_non_streaming_vits_coqui_de.js
 ```

-## Text-to-speech with vits Chinese models (1/2)
+### Text-to-speech with vits Chinese models (1/2)

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-vits-zh-ll.tar.bz2
@@ -174,7 +269,7 @@ rm sherpa-onnx-vits-zh-ll.tar.bz2
 node ./test_tts_non_streaming_vits_zh_ll.js
 ```

-## Text-to-speech with vits Chinese models (2/2)
+### Text-to-speech with vits Chinese models (2/2)

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-icefall-zh-aishell3.tar.bz2
@@ -184,7 +279,7 @@ rm vits-icefall-zh-aishell3.tar.bz2
 node ./test_tts_non_streaming_vits_zh_aishell3.js
 ```

-## Spoken language identification with Whisper multi-lingual models
+### Spoken language identification with Whisper multi-lingual models

 ```bash
 wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.tar.bz2
@@ -202,7 +297,7 @@ npm install naudiodon2
 node ./test_vad_spoken_language_identification_microphone.js
 ```

-## Speaker identification
+### Speaker identification

 You can find more models at
 <https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models>