Commit Graph

79 Commits

Author SHA1 Message Date
Fangjun Kuang
eee5d8a15c Add node-addon-api for VAD (#864) 2024-05-11 20:58:23 +08:00
Fangjun Kuang
a88b3bac21 Fix Python TTS examples for models using jieba. (#861) 2024-05-11 09:21:51 +08:00
Fangjun Kuang
46e4e5b7ac Add C++ support for streaming NeMo CTC models. (#857) 2024-05-10 16:26:43 +08:00
Fangjun Kuang
17cd3a5f01 Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854) 2024-05-10 12:15:39 +08:00
Fangjun Kuang
37a4135dd7 Publish npm package with node-addon-api for Windows (#838) 2024-05-06 16:21:29 +08:00
Fangjun Kuang
54bc504065 Add Python API example for CED audio tagging. (#793) 2024-04-19 18:33:18 +08:00
Fangjun Kuang
13730ecbd8 Add C API for punctuation (#768) 2024-04-14 19:02:34 +08:00
gtf35
b0265b258d Replace torchaudio with soundfile in python-api-examples (#765) 2024-04-13 23:39:07 +08:00
Fangjun Kuang
68b8b88b5a Add Python API for punctuation models. (#762) 2024-04-13 13:28:17 +08:00
Fangjun Kuang
329fe1aa8b Support adding punctuations to the speech recogntion result (#761) 2024-04-13 12:15:57 +08:00
Fangjun Kuang
be4a2488a8 Use batch size 1 in generating subtitles. (#756) 2024-04-11 15:58:11 +08:00
Fangjun Kuang
34d70a259f Add Python API and Python examples for audio tagging (#753) 2024-04-11 11:12:48 +08:00
Fangjun Kuang
042976ea6e Add C++ microphone examples for audio tagging (#749) 2024-04-10 21:00:35 +08:00
Fangjun Kuang
6fb8ceda57 Add VAD examples using ALSA for recording (#739) 2024-04-08 16:41:01 +08:00
Fangjun Kuang
db67e00c77 Add HLG decoding for streaming CTC models (#731) 2024-04-03 21:31:42 +08:00
Fangjun Kuang
2e0bccad36 Add C API for speaker embedding extractor. (#711) 2024-03-28 18:05:40 +08:00
Fangjun Kuang
0d258dd150 Support spoken language identification with whisper (#694) 2024-03-24 22:57:00 +08:00
Fangjun Kuang
44d0ef9ae3 Print the time about the first message in tts. (#655) 2024-03-11 11:05:42 +08:00
Fangjun Kuang
d3287f9494 Add Python ASR examples with alsa (#646) 2024-03-08 11:34:48 +08:00
dragon10
93836ff451 fixed variable's spell num_trailing_blanks (#623)
Signed-off-by: lonngxiang <lonngxiang@gmial.com>
Co-authored-by: lonngxiang <lonngxiang@gmial.com>
2024-03-01 17:02:10 +08:00
Wei Kang
734bbd91dc Add Python API for keyword spotting (#576)
* Add alsa & microphone support for keyword spotting

* Add python wrapper
2024-03-01 09:31:11 +08:00
chiiyeh
e7b18a2139 add blank_penalty for online transducer (#548) 2024-01-26 12:12:13 +08:00
chiiyeh
3bb3849ec5 add blank_penalty for offline transducer (#542) 2024-01-25 15:00:09 +08:00
Fangjun Kuang
59e28518b4 Add Python API examples for speaker recognition with VAD and ASR. (#532) 2024-01-15 21:40:30 +08:00
Fangjun Kuang
68a525a024 Export speaker verification models from NeMo to ONNX (#526) 2024-01-13 19:49:45 +08:00
Fangjun Kuang
55266918c8 Add runtime support for wespeaker models (#516) 2024-01-09 22:06:08 +08:00
Fangjun Kuang
547a22f7d9 Fix #510 (#513) 2024-01-04 12:32:19 +08:00
Fangjun Kuang
e475e750ac Support streaming zipformer CTC (#496)
* Support streaming zipformer CTC

* test online zipformer2 CTC

* Update doc of sherpa-onnx.cc

* Add Python APIs for streaming zipformer2 ctc

* Add Python API examples for streaming zipformer2 ctc

* Swift API for streaming zipformer2 CTC

* NodeJS API for streaming zipformer2 CTC

* Kotlin API for streaming zipformer2 CTC

* Golang API for streaming zipformer2 CTC

* C# API for streaming zipformer2 CTC

* Release v1.9.6
2023-12-22 13:46:33 +08:00
Fangjun Kuang
0e23f82691 Give an informative log for whisper on exceptions. (#473) 2023-12-08 14:33:59 +08:00
Fangjun Kuang
23cf92daf7 Use espeak-ng for coqui-ai/TTS VITS English models. (#466) 2023-12-06 11:00:38 +08:00
Fangjun Kuang
99ff6a834c Play generated audio as it is generating. (#457) 2023-12-02 15:35:11 +08:00
Fangjun Kuang
62dc3c3e46 Use piper-phonemize to convert text to token IDs (#453) 2023-11-30 23:57:43 +08:00
Fangjun Kuang
87a47d7db4 Release GIL to support multithreading in websocket servers. (#451) 2023-11-27 13:44:03 +08:00
Fangjun Kuang
049fb9f451 Add Python APIs for WeNet CTC models (#428) 2023-11-16 14:20:41 +08:00
longshiming
10d6dba187 add --tts-rule-fsts argument at offline-tts.py (#413)
Co-authored-by: longshiming <longshiming@greesoft.com>
2023-11-07 14:18:18 +08:00
Fangjun Kuang
1249710e1d support specifying speed for tts Python APIs (#384) 2023-10-24 21:38:58 +08:00
Fangjun Kuang
8545c3b7f0 Validate input sid (#369) 2023-10-18 14:02:01 +08:00
Fangjun Kuang
1ee79e3ff5 Support Chinese vits models (#368) 2023-10-18 10:19:10 +08:00
Fangjun Kuang
9efe69720d Support VITS VCTK models (#367)
* Support VITS VCTK models

* Release v1.8.1
2023-10-16 17:22:30 +08:00
Fangjun Kuang
655e0fa836 add python API and examples for TTS (#364) 2023-10-14 14:21:53 +08:00
Peng He
4771c9275c Add lm decode for the Python API. (#353)
* Add lm decode for the Python API.

* fix style.

* Fix LogAdd,

	Shouldn't double lm_log_prob when merge same prefix path

* sort the import alphabetically
2023-10-13 11:15:16 +08:00
Fangjun Kuang
be081017de Fix typos/bugs (#351) 2023-10-08 11:39:59 +08:00
Fangjun Kuang
36017d49c4 add a comment about how to download silero_vad.onnx (#346) 2023-09-26 17:58:53 +08:00
Fangjun Kuang
969fff5622 Add VAD + Non-streaming ASR Python example. (#332) 2023-09-22 11:53:47 +08:00
Fangjun Kuang
2d51ca49b7 Generate subtitles (#315) 2023-09-18 10:44:06 +08:00
Fangjun Kuang
c471423125 Add Silero VAD (#313) 2023-09-17 14:54:38 +08:00
Wei Kang
47184f9db7 Refactor hotwords,support loading hotwords from file (#296) 2023-09-14 19:33:17 +08:00
Fangjun Kuang
8982984ea2 add a two-pass python example (#303) 2023-09-10 17:56:13 +08:00
Fangjun Kuang
f709c95c5f Support multilingual whisper models (#274) 2023-08-16 00:28:52 +08:00
Fangjun Kuang
313debe45c small fixes to python api examples (#269) 2023-08-14 20:53:36 +08:00