Commit Graph

109 Commits

Author SHA1 Message Date
Fangjun Kuang
2c2926af7d Add C++ runtime for Matcha-TTS (#1627) 2024-12-31 12:44:14 +08:00
Fangjun Kuang
268d562135 Add TeleSpeech CTC to non_streaming_server.py (#1649) 2024-12-26 11:11:03 +08:00
goddamnVincent
47a2dd4cf8 'update20241203' (#1589)
add '--modeling-unit' and "--bpe-vocab" to /sherpa-onnx/python-api-examples/streaming_server.py make it specifiable.
2024-12-04 09:22:24 +08:00
JiayuXu
0d6bf52844 fix: support both old and new websockets request headers format (#1588)
Co-authored-by: xujiayu <xujiayu@kaihong.com>
2024-12-03 17:22:12 +08:00
VEP
4fab3f2e2f Revert: [#1521] No need to reset sample-buffer (#1524)
Co-authored-by: VEP <517138883@qq.com>
2024-11-08 21:28:04 +08:00
VEP
f94cca71cf Fix: Reset sample-buffer after processing (#1521)
Co-authored-by: VEP <517138883@qq.com>
2024-11-08 19:04:34 +08:00
彭震东
72dc68c8fa fix typo (#1488) 2024-10-28 21:30:18 +08:00
Fangjun Kuang
669f5ef441 Add C++ runtime and Python APIs for Moonshine models (#1473) 2024-10-26 14:34:07 +08:00
Peakyxh
2b40079faf Add speaker identification with VAD and non-streaming ASR using ALSA (#1463) 2024-10-24 22:04:51 +08:00
Fangjun Kuang
8535b1d3bb Python API for speaker diarization. (#1400) 2024-10-09 14:13:26 +08:00
Fangjun Kuang
e7ffcbd677 Add APIs about max speech duration in VAD for various programming languages (#1349) 2024-09-14 12:30:13 +08:00
Fangjun Kuang
1423ddb1f0 Support specifying max speech duration for VAD. (#1348) 2024-09-14 10:57:46 +08:00
Lim Yao Chong
3bffc24d64 Add Python binding for online punctuation models (#1312) 2024-09-09 10:26:53 +08:00
Fangjun Kuang
857cb5075c Fix typos (#1330) 2024-09-09 10:22:42 +08:00
Fangjun Kuang
8a5f5c1999 Fix python two pass ASR examples (#1230) 2024-08-07 18:35:38 +08:00
Fangjun Kuang
d279c8d20e Add more Python examples for SenseVoice (#1179) 2024-07-28 21:54:38 +08:00
Fangjun Kuang
25f0a10468 Add C++ runtime for SenseVoice models (#1148) 2024-07-18 22:54:18 +08:00
Fangjun Kuang
b5093e27f9 Fix publishing apks to huggingface (#1121)
Save APKs for each release in a separate directory.

Huggingface requires that each directory cannot contain more than 1000 files.

Since we have so many tts models and for each model we need to build APKs of 4 different ABIs,
it is a workaround for the huggingface's constraint by placing them into separate directories for different releases.
2024-07-13 16:14:00 +08:00
Fangjun Kuang
dd0ff2ca06 Support onnxruntime 1.18.0 (#906) 2024-07-10 17:05:26 +08:00
Fangjun Kuang
c2cc9dec58 Add Flush to VAD so that the last segment can be detected. (#1099) 2024-07-09 16:15:56 +08:00
Fangjun Kuang
9dd0e03568 Enable to stop TTS generation (#1041) 2024-06-22 18:18:36 +08:00
彭震东
96ab843173 fix typo (#1038) 2024-06-21 11:15:59 +08:00
愚者自愚
167bc76db0 fix generate-subtitles.py bug (#1029)
* fix generate-subtitles.py If the audio file is not muted for more than 1 second at the end, it will cause the last segment to be lost
2024-06-18 18:29:39 +08:00
Fangjun Kuang
349d957da2 Add inverse text normalization for online ASR (#1020) 2024-06-17 18:39:23 +08:00
Fangjun Kuang
b0f7ed3ee3 Add inverse text normalization for non-streaming ASR (#1017) 2024-06-17 14:28:53 +08:00
Fangjun Kuang
fc09227cd1 Add Python example to show how to register speakers dynamically for speaker ID. (#986) 2024-06-10 21:01:48 +08:00
Fangjun Kuang
fd5a0d1e00 Add C++ runtime for Tele-AI/TeleSpeech-ASR (#970) 2024-06-05 00:26:40 +08:00
Fangjun Kuang
b31b9f3a2d Add a VAD Python example to remove silences from a file. (#963) 2024-06-03 16:30:28 +08:00
Fangjun Kuang
b445956675 Fix CI tests. (#898) 2024-05-21 20:37:29 +08:00
Wei Kang
b012b78ceb Encode hotwords in C++ side (#828)
* Encode hotwords in C++ side
2024-05-20 19:41:36 +08:00
Fangjun Kuang
eee5d8a15c Add node-addon-api for VAD (#864) 2024-05-11 20:58:23 +08:00
Fangjun Kuang
a88b3bac21 Fix Python TTS examples for models using jieba. (#861) 2024-05-11 09:21:51 +08:00
Fangjun Kuang
46e4e5b7ac Add C++ support for streaming NeMo CTC models. (#857) 2024-05-10 16:26:43 +08:00
Fangjun Kuang
17cd3a5f01 Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854) 2024-05-10 12:15:39 +08:00
Fangjun Kuang
37a4135dd7 Publish npm package with node-addon-api for Windows (#838) 2024-05-06 16:21:29 +08:00
Fangjun Kuang
54bc504065 Add Python API example for CED audio tagging. (#793) 2024-04-19 18:33:18 +08:00
Fangjun Kuang
13730ecbd8 Add C API for punctuation (#768) 2024-04-14 19:02:34 +08:00
gtf35
b0265b258d Replace torchaudio with soundfile in python-api-examples (#765) 2024-04-13 23:39:07 +08:00
Fangjun Kuang
68b8b88b5a Add Python API for punctuation models. (#762) 2024-04-13 13:28:17 +08:00
Fangjun Kuang
329fe1aa8b Support adding punctuations to the speech recogntion result (#761) 2024-04-13 12:15:57 +08:00
Fangjun Kuang
be4a2488a8 Use batch size 1 in generating subtitles. (#756) 2024-04-11 15:58:11 +08:00
Fangjun Kuang
34d70a259f Add Python API and Python examples for audio tagging (#753) 2024-04-11 11:12:48 +08:00
Fangjun Kuang
042976ea6e Add C++ microphone examples for audio tagging (#749) 2024-04-10 21:00:35 +08:00
Fangjun Kuang
6fb8ceda57 Add VAD examples using ALSA for recording (#739) 2024-04-08 16:41:01 +08:00
Fangjun Kuang
db67e00c77 Add HLG decoding for streaming CTC models (#731) 2024-04-03 21:31:42 +08:00
Fangjun Kuang
2e0bccad36 Add C API for speaker embedding extractor. (#711) 2024-03-28 18:05:40 +08:00
Fangjun Kuang
0d258dd150 Support spoken language identification with whisper (#694) 2024-03-24 22:57:00 +08:00
Fangjun Kuang
44d0ef9ae3 Print the time about the first message in tts. (#655) 2024-03-11 11:05:42 +08:00
Fangjun Kuang
d3287f9494 Add Python ASR examples with alsa (#646) 2024-03-08 11:34:48 +08:00
dragon10
93836ff451 fixed variable's spell num_trailing_blanks (#623)
Signed-off-by: lonngxiang <lonngxiang@gmial.com>
Co-authored-by: lonngxiang <lonngxiang@gmial.com>
2024-03-01 17:02:10 +08:00