Fangjun Kuang
f64c58342b
Support replacing homonphonic phrases ( #2153 )
2025-04-27 15:31:11 +08:00
Karel Vesely
6a1efd8ac2
online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank) ( #2129 )
...
* online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank)
- added `reset_encoder` boolean member into the OnlineRecognizerConfig class
- by default the encoder is not reset
* pybind11, adding empty symbols for disabled modules (tts, diarization)
* reset_encoder, add default value (false) [pybind11]
2025-04-24 08:18:11 +08:00
Nickolay V. Shmyrev
84ed5d4288
Expose dither in python API ( #2127 )
2025-04-17 16:47:48 +08:00
Fangjun Kuang
0de7e1b9f0
Add C++ and Python API for Dolphin CTC models ( #2085 )
2025-04-02 19:09:00 +08:00
niansa/tuxifan
9d23606ee6
Allow building repository as CMake subdirectory ( #2059 )
...
* Use PROJECT_SOURCE_DIR rather than CMAKE_SOURCE_DIR to allow building as subdirectory
* Also use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR in c/cxx api examples
* Only build examples by default when not building as subdirectory
* Do not suggest building binaries either
---------
Co-authored-by: user <user@mail.tld >
2025-03-29 06:27:59 +08:00
Fangjun Kuang
5d2d792b1d
Add Python API for speech enhancement GTCRN models ( #1978 )
2025-03-10 19:02:17 +08:00
Karel Vesely
7740dbfb96
Ebranchformer ( #1951 )
...
* adding ebranchformer encoder
* extend surfaced FeatureExtractorConfig
- so ebranchformer feature extraction can be configured from Python
- the GlobCmvn is not needed, as it is a module in the OnnxEncoder
* clean the code
* Integrating remarks from Fangjun
2025-03-04 19:41:09 +08:00
Fangjun Kuang
316424b382
Add C++ and Python API for FireRedASR AED models ( #1867 )
2025-02-16 22:45:24 +08:00
Fangjun Kuang
69f489f0cd
Support scaling the duration of a pause in TTS. ( #1820 )
2025-02-08 12:47:26 +08:00
Fangjun Kuang
c84a833863
Add C++ and Python API for Kokoro 1.0 multilingual TTS model ( #1795 )
2025-02-06 22:57:13 +08:00
Fangjun Kuang
8b989a851c
Fix keyword spotting. ( #1689 )
...
Reset the stream right after detecting a keyword
2025-01-20 16:41:10 +08:00
Jacklyn
b943341fb1
Fix dither binding in Pybind11 to ensure independence from high_freq in FeatureExtractorConfig ( #1739 )
2025-01-20 16:29:36 +08:00
Fangjun Kuang
ffc6b480a0
Add C++ and Python API for Kokoro TTS models. ( #1715 )
2025-01-16 14:24:51 +08:00
Fangjun Kuang
d3538531c4
Fix initialize TTS in Python. ( #1664 )
2024-12-31 15:14:56 +08:00
Fangjun Kuang
2c2926af7d
Add C++ runtime for Matcha-TTS ( #1627 )
2024-12-31 12:44:14 +08:00
Fangjun Kuang
2101227269
Add streaming ASR support for HarmonyOS. ( #1565 )
2024-11-26 18:36:56 +08:00
Fangjun Kuang
669f5ef441
Add C++ runtime and Python APIs for Moonshine models ( #1473 )
2024-10-26 14:34:07 +08:00
Fangjun Kuang
1d061df355
WebAssembly exmaple for speaker diarization ( #1411 )
2024-10-10 22:14:45 +08:00
Fangjun Kuang
8535b1d3bb
Python API for speaker diarization. ( #1400 )
2024-10-09 14:13:26 +08:00
Fangjun Kuang
b965f14cf0
Add Python API for clustering ( #1385 )
2024-09-30 11:33:15 +08:00
Fangjun Kuang
1423ddb1f0
Support specifying max speech duration for VAD. ( #1348 )
2024-09-14 10:57:46 +08:00
Fangjun Kuang
544857b097
Fix building ( #1343 )
2024-09-13 13:33:52 +08:00
Lim Yao Chong
3bffc24d64
Add Python binding for online punctuation models ( #1312 )
2024-09-09 10:26:53 +08:00
SilverSulfide
888f74bf3c
Re-implement LM rescore for online transducer ( #1231 )
...
Co-authored-by: Martins Kronis <martins.kuznecovs@tilde.lv >
2024-09-06 10:01:25 +08:00
Robin Zhong
62c4d4ab62
Add emotion, event of SenseVoice. ( #1257 )
...
* Add emotion, event of SenseVoice.
* Fix tokens size check and update java api.
https://github.com/k2-fsa/sherpa-onnx/pull/1257
2024-08-14 15:50:13 +08:00
xsjk
1da75ee3c0
Fix typo in offline-lm-config.cc ( #1229 )
2024-08-07 15:38:34 +08:00
Fangjun Kuang
25f0a10468
Add C++ runtime for SenseVoice models ( #1148 )
2024-07-18 22:54:18 +08:00
Wei Kang
5b1fa8750f
Fix hotwords OOV log ( #1139 )
2024-07-16 19:41:31 +08:00
Fangjun Kuang
08c758520f
Add keyword spotting for C# ( #1105 )
2024-07-10 21:18:46 +08:00
Fangjun Kuang
dd0ff2ca06
Support onnxruntime 1.18.0 ( #906 )
2024-07-10 17:05:26 +08:00
Fangjun Kuang
c2cc9dec58
Add Flush to VAD so that the last segment can be detected. ( #1099 )
2024-07-09 16:15:56 +08:00
Manix
3e4307e2fb
updating trt workspace int64 ( #1094 )
...
Signed-off-by: Manix <manickavela1998@gmail.com >
2024-07-08 20:38:16 +08:00
Manix
55decb7bee
Add config for TensorRT and CUDA execution provider ( #992 )
...
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com >
Signed-off-by: manickavela1998@gmail.com <manickavela.arumugam@uniphore.com >
2024-07-05 15:18:37 +08:00
Fangjun Kuang
9dd0e03568
Enable to stop TTS generation ( #1041 )
2024-06-22 18:18:36 +08:00
SilverSulfide
656b9fa1c8
Add Python API support for Offline LM rescoring ( #1033 )
2024-06-19 16:29:37 +08:00
Fangjun Kuang
349d957da2
Add inverse text normalization for online ASR ( #1020 )
2024-06-17 18:39:23 +08:00
Fangjun Kuang
b0f7ed3ee3
Add inverse text normalization for non-streaming ASR ( #1017 )
2024-06-17 14:28:53 +08:00
Fangjun Kuang
1a43d1e37f
Support getting word IDs for CTC HLG decoding. ( #978 )
2024-06-06 14:22:39 +08:00
Fangjun Kuang
fd5a0d1e00
Add C++ runtime for Tele-AI/TeleSpeech-ASR ( #970 )
2024-06-05 00:26:40 +08:00
Fangjun Kuang
f1cff83ef9
Add address sanitizer and undefined behavior sanitizer ( #951 )
2024-05-31 13:17:01 +08:00
Fangjun Kuang
4f21aabd3c
Fix CI for JavaScript and Python APIs. ( #901 )
2024-05-22 13:57:00 +08:00
Wei Kang
b012b78ceb
Encode hotwords in C++ side ( #828 )
...
* Encode hotwords in C++ side
2024-05-20 19:41:36 +08:00
Fangjun Kuang
46e4e5b7ac
Add C++ support for streaming NeMo CTC models. ( #857 )
2024-05-10 16:26:43 +08:00
Fangjun Kuang
17cd3a5f01
Add C++ runtime for non-streaming faster conformer transducer from NeMo. ( #854 )
2024-05-10 12:15:39 +08:00
Karel Vesely
2e45d327a5
Adding temperature scaling on Joiner logits: ( #789 )
...
* Adding temperature scaling on Joiner logits:
- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
- the BPE scores were rescaled with 0.2 (but then also incorrect words
get high confidence, visually reasonable histograms are for 0.5 scale)
- BPE->WORD score merging done by min(.) function
(tried also prob-product, and also arithmetic, geometric, harmonic mean)
- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)
Results seem consistent with: https://arxiv.org/abs/2110.15222
Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.
I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.
Temperature scling added also to the Greedy search confidences.
* making `temperature_scale` configurable from outside
2024-04-26 09:44:26 +08:00
Fangjun Kuang
6b353bfb42
Add jieba for Chinese TTS models ( #797 )
2024-04-21 14:47:13 +08:00
Fangjun Kuang
54bc504065
Add Python API example for CED audio tagging. ( #793 )
2024-04-19 18:33:18 +08:00
Fangjun Kuang
c1608b3524
Support CED models ( #792 )
2024-04-19 15:20:37 +08:00
chiiyeh
aa2d695fd2
Add score function to speaker identification ( #775 )
2024-04-16 17:29:46 +08:00
Fangjun Kuang
6bf2099781
Fix code style issues ( #774 )
2024-04-16 09:46:15 +08:00