Commit Graph

393 Commits

Author SHA1 Message Date
Fangjun Kuang
5d2d792b1d Add Python API for speech enhancement GTCRN models (#1978) 2025-03-10 19:02:17 +08:00
Fangjun Kuang
488a6e687c Add C++ runtime for speech enhancement GTCRN models (#1977)
See also https://github.com/Xiaobin-Rong/gtcrn
2025-03-10 18:11:16 +08:00
cjsdurj
b87fce9a7f c-api add wave write to buffer. (#1962)
Co-authored-by: jian.chen03 <jian.chen03@transwarp.io>
2025-03-10 17:21:23 +08:00
Fangjun Kuang
362ddf2c07 Add C++ demo for VAD+non-streaming ASR (#1964) 2025-03-07 11:49:46 +08:00
Karel Vesely
7740dbfb96 Ebranchformer (#1951)
* adding ebranchformer encoder

* extend surfaced FeatureExtractorConfig

- so ebranchformer feature extraction can be configured from Python
- the GlobCmvn is not needed, as it is a module in the OnnxEncoder

* clean the code

* Integrating remarks from Fangjun
2025-03-04 19:41:09 +08:00
Fangjun Kuang
209eaaae1d Limit number of tokens per second for whisper. (#1958)
Otherwise, it spends lots of time in the loop if the EOT token
is not predicted.
2025-03-04 15:45:28 +08:00
Fangjun Kuang
c9d6859df7 Add transducer modified_beam_search for RKNN. (#1949) 2025-03-03 13:15:25 +08:00
Fangjun Kuang
d5e7b51af5 Support RKNN for Zipformer CTC models. (#1948) 2025-03-02 21:40:13 +08:00
Fangjun Kuang
dfcbc8d40b Add Kokoro v1.1-zh (#1942) 2025-02-28 15:47:59 +08:00
Fangjun Kuang
337d5f7a80 Release v1.10.46 (#1929) 2025-02-26 19:19:33 +08:00
Fangjun Kuang
eebe19997d Build wheels for rknn linux aarch64 (#1928) 2025-02-26 18:58:57 +08:00
Fangjun Kuang
82cb8a5dc3 Minor fixes for rknn (#1925) 2025-02-26 16:26:18 +08:00
Fangjun Kuang
4d79e6a007 Add C++ API for streaming zipformer ASR on RK NPU (#1908) 2025-02-24 19:07:37 +08:00
ivan provalov
94728bfbee Fixing Whisper Model Token Normalization (#1904) 2025-02-21 12:58:01 +08:00
Fangjun Kuang
ed922e61b5 Fix publishing pre-built windows libraries (#1905) 2025-02-21 11:59:27 +08:00
Fangjun Kuang
316424b382 Add C++ and Python API for FireRedASR AED models (#1867) 2025-02-16 22:45:24 +08:00
Fangjun Kuang
944400e399 Fix spliting text by languages for kokoro tts. (#1849) 2025-02-13 18:19:34 +08:00
ahadjawaid
73d7c25233 Fix: made print sherpa_onnx_loge when it is in debug mode (#1838)
Currently, during normal use you may get a lot of print statements such as: `Use espeak-ng to handle the OOV: 'ipsum'` which may not be relevant unless you are debugging.
2025-02-11 00:22:50 +08:00
Fangjun Kuang
ad883d44fe Support specifying voice in espeak-ng for kokoro tts models. (#1836) 2025-02-10 19:05:53 +08:00
Fangjun Kuang
d5da9430e8 Add PengChengStarling models to sherpa-onnx (#1835) 2025-02-10 18:23:40 +08:00
Fangjun Kuang
9559a10bd3 Add C++ support for MatchaTTS models not from icefall. (#1834) 2025-02-10 15:38:29 +08:00
Fangjun Kuang
69f489f0cd Support scaling the duration of a pause in TTS. (#1820) 2025-02-08 12:47:26 +08:00
Fangjun Kuang
d38cb81014 Fix passing gb2312 encoded strings to tts on Windows (#1819) 2025-02-08 09:48:58 +08:00
Fangjun Kuang
7330f7519a Add C API for Kokoro TTS 1.0 (#1801) 2025-02-07 14:30:40 +08:00
Fangjun Kuang
c84a833863 Add C++ and Python API for Kokoro 1.0 multilingual TTS model (#1795) 2025-02-06 22:57:13 +08:00
ahadjawaid
8677d83efc Fix: Prepend 0 to tokenization to prevent word skipping for Kokoro. (#1787)
Addressed issue Skipping words #1777
2025-02-03 13:49:42 +08:00
Fangjun Kuang
f178e96bf0 Add keyword spotter C API for HarmonyOS (#1769) 2025-01-26 14:12:30 +08:00
Fangjun Kuang
8b989a851c Fix keyword spotting. (#1689)
Reset the stream right after detecting a keyword
2025-01-20 16:41:10 +08:00
Fangjun Kuang
2d0869c709 Fix style issues (#1718) 2025-01-16 15:43:51 +08:00
Fangjun Kuang
ffc6b480a0 Add C++ and Python API for Kokoro TTS models. (#1715) 2025-01-16 14:24:51 +08:00
Fangjun Kuang
cbe07ac1b6 Release v1.10.39 (#1702) 2025-01-13 10:28:05 +08:00
Fangjun Kuang
1fe5fe495f Add Android demo for MatchaTTS models. (#1683) 2025-01-06 06:44:09 +08:00
Fangjun Kuang
bf3330c906 Add HarmonyOS examples for MatchaTTS. (#1678) 2025-01-03 17:09:29 +08:00
Fangjun Kuang
9aa4897a9e Add C API for MatchaTTS models (#1675) 2025-01-03 12:17:26 +08:00
Fangjun Kuang
a00d3b4821 Add Java API for Matcha-TTS models. (#1673) 2025-01-02 15:15:30 +08:00
Fangjun Kuang
f457baea42 Support Matcha-TTS models using espeak-ng (#1672) 2025-01-02 13:46:43 +08:00
Fangjun Kuang
3422b9388d Add Kotlin API for Matcha-TTS models. (#1668) 2024-12-31 19:20:52 +08:00
Fangjun Kuang
ebe92e523d Remove spaces after punctuations for TTS (#1666) 2024-12-31 16:06:27 +08:00
Fangjun Kuang
2c2926af7d Add C++ runtime for Matcha-TTS (#1627) 2024-12-31 12:44:14 +08:00
Fangjun Kuang
b6f0f5fc2e Support removing invalid utf-8 sequences. (#1648) 2024-12-25 19:32:13 +08:00
Fangjun Kuang
d00d1c6298 Fix GitHub actions. (#1642) 2024-12-24 11:34:35 +08:00
Fangjun Kuang
b76cd9033a Support decoding with byte-level BPE (bbpe) models. (#1633) 2024-12-20 19:21:32 +08:00
Fangjun Kuang
1bae4085ca Add speaker diarization API for HarmonyOS. (#1609) 2024-12-10 16:03:03 +08:00
Fangjun Kuang
314545f938 Add speaker identification APIs for HarmonyOS (#1607)
* Add speaker embedding extractor API for HarmonyOS

* Add ArkTS API for speaker identification
2024-12-09 19:23:18 +08:00
Fangjun Kuang
a743a4400f Add on-device real-time ASR demo for HarmonyOS (#1606) 2024-12-09 16:40:15 +08:00
Fangjun Kuang
74a8735f7a Add on-device tex-to-speech (TTS) demo for HarmonyOS (#1590) 2024-12-04 14:27:12 +08:00
Fangjun Kuang
dc3287f3a8 Add HarmonyOS support for text-to-speech. (#1584) 2024-12-01 21:43:34 +08:00
Fangjun Kuang
109fb799ca fix building for Android (#1568) 2024-11-27 10:36:16 +08:00
Fangjun Kuang
2101227269 Add streaming ASR support for HarmonyOS. (#1565) 2024-11-26 18:36:56 +08:00
Fangjun Kuang
298b6b6fda Add non-streaming ASR support for HarmonyOS. (#1564) 2024-11-26 16:38:35 +08:00