Fangjun Kuang
0aacf02dd8
Add C++ runtime for vocos ( #2014 )
2025-03-17 17:05:15 +08:00
Fangjun Kuang
488a6e687c
Add C++ runtime for speech enhancement GTCRN models ( #1977 )
...
See also https://github.com/Xiaobin-Rong/gtcrn
2025-03-10 18:11:16 +08:00
Fangjun Kuang
362ddf2c07
Add C++ demo for VAD+non-streaming ASR ( #1964 )
2025-03-07 11:49:46 +08:00
Karel Vesely
7740dbfb96
Ebranchformer ( #1951 )
...
* adding ebranchformer encoder
* extend surfaced FeatureExtractorConfig
- so ebranchformer feature extraction can be configured from Python
- the GlobCmvn is not needed, as it is a module in the OnnxEncoder
* clean the code
* Integrating remarks from Fangjun
2025-03-04 19:41:09 +08:00
Fangjun Kuang
c9d6859df7
Add transducer modified_beam_search for RKNN. ( #1949 )
2025-03-03 13:15:25 +08:00
Fangjun Kuang
d5e7b51af5
Support RKNN for Zipformer CTC models. ( #1948 )
2025-03-02 21:40:13 +08:00
Fangjun Kuang
4d79e6a007
Add C++ API for streaming zipformer ASR on RK NPU ( #1908 )
2025-02-24 19:07:37 +08:00
Fangjun Kuang
316424b382
Add C++ and Python API for FireRedASR AED models ( #1867 )
2025-02-16 22:45:24 +08:00
Fangjun Kuang
944400e399
Fix spliting text by languages for kokoro tts. ( #1849 )
2025-02-13 18:19:34 +08:00
Fangjun Kuang
c84a833863
Add C++ and Python API for Kokoro 1.0 multilingual TTS model ( #1795 )
2025-02-06 22:57:13 +08:00
Fangjun Kuang
ffc6b480a0
Add C++ and Python API for Kokoro TTS models. ( #1715 )
2025-01-16 14:24:51 +08:00
Fangjun Kuang
2c2926af7d
Add C++ runtime for Matcha-TTS ( #1627 )
2024-12-31 12:44:14 +08:00
Fangjun Kuang
b6f0f5fc2e
Support removing invalid utf-8 sequences. ( #1648 )
2024-12-25 19:32:13 +08:00
Fangjun Kuang
b76cd9033a
Support decoding with byte-level BPE (bbpe) models. ( #1633 )
2024-12-20 19:21:32 +08:00
Fangjun Kuang
31d6206fde
HarmonyOS support for VAD. ( #1561 )
2024-11-24 16:29:24 +08:00
Fangjun Kuang
f97daed408
Fixes #1512 ( #1522 )
2024-11-08 21:07:36 +08:00
Fangjun Kuang
669f5ef441
Add C++ runtime and Python APIs for Moonshine models ( #1473 )
2024-10-26 14:34:07 +08:00
Fangjun Kuang
b3e05f6dc4
Fix style issues ( #1458 )
2024-10-24 11:15:08 +08:00
Fangjun Kuang
59407edcad
C++ API for speaker diarization ( #1396 )
2024-10-09 12:01:20 +08:00
Fangjun Kuang
70568c2df7
Support Agglomerative clustering. ( #1384 )
...
We use the open-source implementation from
https://github.com/cdalitz/hclust-cpp
2024-09-29 23:44:29 +08:00
jianyou
1414e4dc61
Add online punctuation and casing prediction model for English language ( #1224 )
2024-08-06 17:33:38 +08:00
Fangjun Kuang
d5f486878d
Remove libonnxruntime_providers_cuda.so as a dependency. ( #1210 )
2024-08-03 16:25:23 +08:00
Fangjun Kuang
25f0a10468
Add C++ runtime for SenseVoice models ( #1148 )
2024-07-18 22:54:18 +08:00
Fangjun Kuang
960eb7529e
Add C++ runtime for MeloTTS ( #1138 )
2024-07-16 15:55:02 +08:00
Fangjun Kuang
a25075101c
Build sherpa-onnx as a single shared library ( #1078 )
...
When `-D BUILD_SHARED_LIBS=ON` is passed to `cmake`, it builds a single shared library.
Specifically,
- For C APIs, it builds `libsherpa-onnx-c-api.so`
- For Python APIs, it builds `_sherpa_onnx.cpython-xx-xx.so`
- For Kotlin and Java APIs, it builds `libsherpa-onnx-jni.so`
There is no `libsherpa-onnx-core.so` any longer.
Note it affects only shared libraries.
2024-07-06 16:41:54 +08:00
Manix
55decb7bee
Add config for TensorRT and CUDA execution provider ( #992 )
...
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com >
Signed-off-by: manickavela1998@gmail.com <manickavela.arumugam@uniphore.com >
2024-07-05 15:18:37 +08:00
Fangjun Kuang
598c12c4e5
Fix CI tests ( #1061 )
2024-06-27 18:05:18 +08:00
Fangjun Kuang
a11c859971
Support clang-tidy ( #1034 )
2024-06-19 20:51:57 +08:00
Fangjun Kuang
6789c909d2
Inverse text normalization API of streaming ASR for various programming languages ( #1022 )
2024-06-18 13:42:17 +08:00
Fangjun Kuang
fd5a0d1e00
Add C++ runtime for Tele-AI/TeleSpeech-ASR ( #970 )
2024-06-05 00:26:40 +08:00
Sangeet Sagar
3f472a9993
Add C++ runtime for *streaming* faster conformer transducer from NeMo. ( #889 )
...
Co-authored-by: sangeet2020 <15uec053@gmail.com >
2024-05-30 13:55:03 +08:00
Wei Kang
b012b78ceb
Encode hotwords in C++ side ( #828 )
...
* Encode hotwords in C++ side
2024-05-20 19:41:36 +08:00
Fangjun Kuang
46e4e5b7ac
Add C++ support for streaming NeMo CTC models. ( #857 )
2024-05-10 16:26:43 +08:00
Fangjun Kuang
17cd3a5f01
Add C++ runtime for non-streaming faster conformer transducer from NeMo. ( #854 )
2024-05-10 12:15:39 +08:00
Fangjun Kuang
6b353bfb42
Add jieba for Chinese TTS models ( #797 )
2024-04-21 14:47:13 +08:00
Fangjun Kuang
c1608b3524
Support CED models ( #792 )
2024-04-19 15:20:37 +08:00
Fangjun Kuang
329fe1aa8b
Support adding punctuations to the speech recogntion result ( #761 )
2024-04-13 12:15:57 +08:00
Fangjun Kuang
042976ea6e
Add C++ microphone examples for audio tagging ( #749 )
2024-04-10 21:00:35 +08:00
Fangjun Kuang
f20291cadc
Support audio tagging using zipformer ( #747 )
2024-04-10 14:47:06 +08:00
Fangjun Kuang
6fb8ceda57
Add VAD examples using ALSA for recording ( #739 )
2024-04-08 16:41:01 +08:00
Fangjun Kuang
a5f8fbc83f
Support heteronyms in Chinese TTS ( #738 )
2024-04-08 11:01:30 +08:00
Fangjun Kuang
db67e00c77
Add HLG decoding for streaming CTC models ( #731 )
2024-04-03 21:31:42 +08:00
Fangjun Kuang
4e040c596e
Support including TTS conditionally. ( #699 )
2024-03-26 17:21:35 +08:00
Fangjun Kuang
0d258dd150
Support spoken language identification with whisper ( #694 )
2024-03-24 22:57:00 +08:00
Wei Kang
734bbd91dc
Add Python API for keyword spotting ( #576 )
...
* Add alsa & microphone support for keyword spotting
* Add python wrapper
2024-03-01 09:31:11 +08:00
Fangjun Kuang
87a7030c08
Support using alsa to access the microphone with non-streaming ASR models ( #517 )
2024-02-26 21:17:26 +08:00
Fangjun Kuang
67acd34dcd
Use alsa to read microphone in speaker identification demo. ( #605 )
2024-02-23 19:27:51 +08:00
Fangjun Kuang
099a0ccae3
Link the math lib. ( #592 )
2024-02-21 15:36:54 +08:00
Fangjun Kuang
d771762868
Support WebAssembly for text-to-speech ( #577 )
2024-02-08 23:39:12 +08:00
Fangjun Kuang
0b18ccfbb2
C++ API demo for speaker identification with portaudio. ( #561 )
2024-01-30 11:21:43 +08:00