enginex-mr_series-sherpa-onnx

EngineX-Iluvatar/enginex-mr_series-sherpa-onnx

Archived

Author	SHA1	Message	Date
Askars Salimbajevs	f0960342ad	Add LODR support to online and offline recognizers (#2026 ) This PR integrates LODR (Level-Ordered Deterministic Rescoring) support from Icefall into both online and offline recognizers, enabling LODR for LM shallow fusion and LM rescore. - Extended OnlineLMConfig and OfflineLMConfig to include lodr_fst, lodr_scale, and lodr_backoff_id. - Implemented LodrFst and LodrStateCost classes and wired them into RNN LM scoring in both online and offline code paths. - Updated Python bindings, CLI entry points, examples, and CI test scripts to accept and exercise the new LODR options.	2025-07-09 16:23:46 +08:00
Fangjun Kuang	0e738c356c	Add C++ runtime and Python API for NeMo Canary models (#2352 )	2025-07-07 17:03:49 +08:00
Fangjun Kuang	bda427f4b2	Add API to get version information (#2309 )	2025-06-25 00:22:21 +08:00
Fangjun Kuang	6982b86c66	Support extra languages in multi-lang kokoro tts (#2303 )	2025-06-20 11:22:52 +08:00
Fangjun Kuang	59d118c256	Refactor kokoro export (#2302 ) - generate samples for https://k2-fsa.github.io/sherpa/onnx/tts/all/ - provide int8 model for kokoro v0.19 kokoro-int8-en-v0_19.tar.bz2	2025-06-18 20:30:10 +08:00
Fangjun Kuang	2913cce77c	Add scripts for exporting Piper TTS models to sherpa-onnx (#2299 )	2025-06-17 14:23:39 +08:00
Fangjun Kuang	d57e4f84de	Add Python API for source separation (#2283 )	2025-06-05 20:44:26 +08:00
Fangjun Kuang	f64c58342b	Support replacing homonphonic phrases (#2153 )	2025-04-27 15:31:11 +08:00
Karel Vesely	6a1efd8ac2	online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank) (#2129 ) * online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank) - added `reset_encoder` boolean member into the OnlineRecognizerConfig class - by default the encoder is not reset * pybind11, adding empty symbols for disabled modules (tts, diarization) * reset_encoder, add default value (false) [pybind11]	2025-04-24 08:18:11 +08:00
Fangjun Kuang	0de7e1b9f0	Add C++ and Python API for Dolphin CTC models (#2085 )	2025-04-02 19:09:00 +08:00
niansa/tuxifan	9d23606ee6	Allow building repository as CMake subdirectory (#2059 ) * Use PROJECT_SOURCE_DIR rather than CMAKE_SOURCE_DIR to allow building as subdirectory * Also use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR in c/cxx api examples * Only build examples by default when not building as subdirectory * Do not suggest building binaries either --------- Co-authored-by: user <user@mail.tld>	2025-03-29 06:27:59 +08:00
Fangjun Kuang	5d2d792b1d	Add Python API for speech enhancement GTCRN models (#1978 )	2025-03-10 19:02:17 +08:00
Karel Vesely	7740dbfb96	Ebranchformer (#1951 ) * adding ebranchformer encoder * extend surfaced FeatureExtractorConfig - so ebranchformer feature extraction can be configured from Python - the GlobCmvn is not needed, as it is a module in the OnnxEncoder * clean the code * Integrating remarks from Fangjun	2025-03-04 19:41:09 +08:00
Fangjun Kuang	316424b382	Add C++ and Python API for FireRedASR AED models (#1867 )	2025-02-16 22:45:24 +08:00
Fangjun Kuang	69f489f0cd	Support scaling the duration of a pause in TTS. (#1820 )	2025-02-08 12:47:26 +08:00
Fangjun Kuang	c84a833863	Add C++ and Python API for Kokoro 1.0 multilingual TTS model (#1795 )	2025-02-06 22:57:13 +08:00
Fangjun Kuang	8b989a851c	Fix keyword spotting. (#1689 ) Reset the stream right after detecting a keyword	2025-01-20 16:41:10 +08:00
Jacklyn	b943341fb1	Fix `dither` binding in Pybind11 to ensure independence from `high_freq` in `FeatureExtractorConfig` (#1739 )	2025-01-20 16:29:36 +08:00
Fangjun Kuang	ffc6b480a0	Add C++ and Python API for Kokoro TTS models. (#1715 )	2025-01-16 14:24:51 +08:00
Fangjun Kuang	d3538531c4	Fix initialize TTS in Python. (#1664 )	2024-12-31 15:14:56 +08:00
Fangjun Kuang	2c2926af7d	Add C++ runtime for Matcha-TTS (#1627 )	2024-12-31 12:44:14 +08:00
Fangjun Kuang	2101227269	Add streaming ASR support for HarmonyOS. (#1565 )	2024-11-26 18:36:56 +08:00
Fangjun Kuang	669f5ef441	Add C++ runtime and Python APIs for Moonshine models (#1473 )	2024-10-26 14:34:07 +08:00
Fangjun Kuang	1d061df355	WebAssembly exmaple for speaker diarization (#1411 )	2024-10-10 22:14:45 +08:00
Fangjun Kuang	8535b1d3bb	Python API for speaker diarization. (#1400 )	2024-10-09 14:13:26 +08:00
Fangjun Kuang	b965f14cf0	Add Python API for clustering (#1385 )	2024-09-30 11:33:15 +08:00
Fangjun Kuang	1423ddb1f0	Support specifying max speech duration for VAD. (#1348 )	2024-09-14 10:57:46 +08:00
Fangjun Kuang	544857b097	Fix building (#1343 )	2024-09-13 13:33:52 +08:00
Lim Yao Chong	3bffc24d64	Add Python binding for online punctuation models (#1312 )	2024-09-09 10:26:53 +08:00
SilverSulfide	888f74bf3c	Re-implement LM rescore for online transducer (#1231 ) Co-authored-by: Martins Kronis <martins.kuznecovs@tilde.lv>	2024-09-06 10:01:25 +08:00
Robin Zhong	62c4d4ab62	Add emotion, event of SenseVoice. (#1257 ) * Add emotion, event of SenseVoice. * Fix tokens size check and update java api. https://github.com/k2-fsa/sherpa-onnx/pull/1257	2024-08-14 15:50:13 +08:00
xsjk	1da75ee3c0	Fix typo in offline-lm-config.cc (#1229 )	2024-08-07 15:38:34 +08:00
Fangjun Kuang	25f0a10468	Add C++ runtime for SenseVoice models (#1148 )	2024-07-18 22:54:18 +08:00
Fangjun Kuang	c2cc9dec58	Add Flush to VAD so that the last segment can be detected. (#1099 )	2024-07-09 16:15:56 +08:00
Manix	3e4307e2fb	updating trt workspace int64 (#1094 ) Signed-off-by: Manix <manickavela1998@gmail.com>	2024-07-08 20:38:16 +08:00
Manix	55decb7bee	Add config for TensorRT and CUDA execution provider (#992 ) Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com> Signed-off-by: manickavela1998@gmail.com <manickavela.arumugam@uniphore.com>	2024-07-05 15:18:37 +08:00
Fangjun Kuang	9dd0e03568	Enable to stop TTS generation (#1041 )	2024-06-22 18:18:36 +08:00
Fangjun Kuang	349d957da2	Add inverse text normalization for online ASR (#1020 )	2024-06-17 18:39:23 +08:00
Fangjun Kuang	b0f7ed3ee3	Add inverse text normalization for non-streaming ASR (#1017 )	2024-06-17 14:28:53 +08:00
Fangjun Kuang	1a43d1e37f	Support getting word IDs for CTC HLG decoding. (#978 )	2024-06-06 14:22:39 +08:00
Fangjun Kuang	fd5a0d1e00	Add C++ runtime for Tele-AI/TeleSpeech-ASR (#970 )	2024-06-05 00:26:40 +08:00
Fangjun Kuang	f1cff83ef9	Add address sanitizer and undefined behavior sanitizer (#951 )	2024-05-31 13:17:01 +08:00
Wei Kang	b012b78ceb	Encode hotwords in C++ side (#828 ) * Encode hotwords in C++ side	2024-05-20 19:41:36 +08:00
Fangjun Kuang	46e4e5b7ac	Add C++ support for streaming NeMo CTC models. (#857 )	2024-05-10 16:26:43 +08:00
Fangjun Kuang	17cd3a5f01	Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854 )	2024-05-10 12:15:39 +08:00
Karel Vesely	2e45d327a5	Adding temperature scaling on Joiner logits: (#789 ) * Adding temperature scaling on Joiner logits: - T hard-coded to 2.0 - so far best result NCE 0.122 (still not so high) - the BPE scores were rescaled with 0.2 (but then also incorrect words get high confidence, visually reasonable histograms are for 0.5 scale) - BPE->WORD score merging done by min(.) function (tried also prob-product, and also arithmetic, geometric, harmonic mean) - without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best) Results seem consistent with: https://arxiv.org/abs/2110.15222 Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model. I also experimented with blank posteriors mixed into the BPE confidences, but no NCE improvement found, so not pushing that. Temperature scling added also to the Greedy search confidences. * making `temperature_scale` configurable from outside	2024-04-26 09:44:26 +08:00
Fangjun Kuang	6b353bfb42	Add jieba for Chinese TTS models (#797 )	2024-04-21 14:47:13 +08:00
Fangjun Kuang	54bc504065	Add Python API example for CED audio tagging. (#793 )	2024-04-19 18:33:18 +08:00
Fangjun Kuang	c1608b3524	Support CED models (#792 )	2024-04-19 15:20:37 +08:00
chiiyeh	aa2d695fd2	Add score function to speaker identification (#775 )	2024-04-16 17:29:46 +08:00

1 2 3

116 Commits