Commit Graph

434 Commits

Author SHA1 Message Date
mtdxc
e0ca224b76 fixed mfc build error (#2267)
Co-authored-by: cqm <cqm@97kid.com>
2025-05-31 23:32:35 +08:00
mtdxc
613e8084c2 move portaudio common record code to microphone (#2264)
Co-authored-by: cqm <cqm@97kid.com>
2025-05-31 21:48:41 +08:00
Fangjun Kuang
8e6826521e Update kaldi-native-fbank. (#2259)
Now it supports FFT of an even number, not necessarily a power of 2.
2025-05-29 10:34:22 +08:00
Fangjun Kuang
16a3449945 Build APK with replace.fst (#2254) 2025-05-28 12:19:29 +08:00
yegyu
2107afdbd4 Add include headers for __ANDROID_API__,__OHOS__ (#2251) 2025-05-27 14:44:06 +08:00
Fangjun Kuang
716ba8317b Add C++ runtime for spleeter about source separation (#2242) 2025-05-23 22:30:57 +08:00
Fangjun Kuang
2e9e0b4e9e Add Android demo for real-time ASR with non-streaming ASR models. (#2214) 2025-05-14 19:10:44 +08:00
Fangjun Kuang
0dfafed7d0 Support homophone replacer in Android asr demo. (#2210) 2025-05-14 10:58:35 +08:00
Fangjun Kuang
9a0e16f092 Support sending is_eof for online websocket server. (#2204)
is_final=true means an endpoint is detected.

is_eof=true means all received samples have been processed
by the server.
2025-05-13 14:49:22 +08:00
Fangjun Kuang
028b8f2718 Add C++ example for streaming ASR with SenseVoice. (#2199) 2025-05-11 00:23:32 +08:00
Fangjun Kuang
a6834f6556 Show verbose logs in homophone replacer (#2194) 2025-05-09 10:48:30 +08:00
Fangjun Kuang
562a5f7d9b Fix building wheels for macOS (#2192) 2025-05-08 19:15:33 +08:00
Fangjun Kuang
f9c99032c3 Avoid NaN in feature normalization. (#2186) 2025-05-08 11:22:47 +08:00
Fangjun Kuang
f00066db88 Add C++ runtime for parakeet-tdt-0.6b-v2. (#2181) 2025-05-06 16:59:01 +08:00
Fangjun Kuang
4a7a974a04 More fix for building without tts (#2162) 2025-04-29 16:31:31 +08:00
Fangjun Kuang
f64c58342b Support replacing homonphonic phrases (#2153) 2025-04-27 15:31:11 +08:00
Fangjun Kuang
72742d5472 Fix punctuations for kokoro tts 1.1-zh. (#2146) 2025-04-24 15:08:47 +08:00
Karel Vesely
6a1efd8ac2 online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank) (#2129)
* online-transducer: reset the encoder toghter with 2 previous output symbols (non-blank)

- added `reset_encoder` boolean member into the OnlineRecognizerConfig class
- by default the encoder is not reset

* pybind11, adding empty symbols for disabled modules (tts, diarization)

* reset_encoder, add default value (false) [pybind11]
2025-04-24 08:18:11 +08:00
Karel Vesely
f3d23aa170 cmake build, configurable from env (#2115)
- make sure the defaults in `cmake/cmake_extension.py` variable
  `extra_cmake_args` can be overriden by `cmake_args` from
  `SHERPA_ONNX_CMAKE_ARGS` env variable
- fix a bug in `sherpa-onnx/csrc/parse-options.cc` which appears
  when using `-DSHERPA_ONNX_ENABLE_CHECK=ON`
- avoid copying binaries when these are disabled
2025-04-16 21:26:54 +08:00
Fangjun Kuang
7a78f2eb7a Fix building for HarmonyOS (#2125) 2025-04-15 18:00:07 +08:00
Fangjun Kuang
e3bce847c0 Support running sherpa-onnx with RK NPU on Android (#2124) 2025-04-15 16:42:28 +08:00
Askars Salimbajevs
664b461d01 Disable strict hotword matching mode for offline transducer (#1837)
* Disable strict hotword matching mode for offline transducer. Also introduces new variable, so that later this mode can be switched on in the runtime.

* remove strict mode variable

---------

Co-authored-by: Askars Salimbajevs <askars.salimbajevs@tilde.lv>
2025-04-03 22:52:19 +08:00
Askars Salimbajevs
18a6ed5ddc Preserve more context after endpointing in transducer (#2061) 2025-04-02 23:33:47 +08:00
Fangjun Kuang
0de7e1b9f0 Add C++ and Python API for Dolphin CTC models (#2085) 2025-04-02 19:09:00 +08:00
Fangjun Kuang
1316719e23 Fix building for android (#2081) 2025-04-01 19:36:40 +08:00
Fangjun Kuang
a11e359c11 Refactor rknn code (#2079) 2025-04-01 16:54:53 +08:00
Fangjun Kuang
8e51a97550 Add C++ runtime for silero_vad with RKNN (#2078) 2025-04-01 15:56:56 +08:00
Fangjun Kuang
0703bc1b86 Add CXX API for VAD (#2077) 2025-04-01 14:51:43 +08:00
Anders Xiao
ce196fceae fix dml with preinstall ort (#2066) 2025-03-30 12:07:19 +08:00
niansa/tuxifan
9d23606ee6 Allow building repository as CMake subdirectory (#2059)
* Use PROJECT_SOURCE_DIR rather than CMAKE_SOURCE_DIR to allow building as subdirectory

* Also use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR in c/cxx api examples

* Only build examples by default when not building as subdirectory

* Do not suggest building binaries either

---------

Co-authored-by: user <user@mail.tld>
2025-03-29 06:27:59 +08:00
Fangjun Kuang
a5dd0cdfc3 Fix length scale for kokoro tts (#2060) 2025-03-27 10:52:01 +08:00
yourengod
bd61c1d8e5 Change scale factor to 32767 (#2056) 2025-03-26 10:44:49 +08:00
Fangjun Kuang
823e2e6257 Fix building wheels for RKNN (#2041) 2025-03-22 18:33:32 +08:00
Sangeet Sagar
31096e43bd fix static linking (#2032) 2025-03-21 12:47:45 +08:00
Fangjun Kuang
a19e57604e Fix Matcha + vocos for Android (#2024) 2025-03-19 18:39:10 +08:00
Fangjun Kuang
a50901f366 Fix a bug in vad.reset() (#2023)
We also need to clear _last
2025-03-19 17:42:05 +08:00
Fangjun Kuang
1f52ac2126 add alsa example for vad+offline asr (#2020) 2025-03-18 20:06:24 +08:00
Fangjun Kuang
406272210f Fix CI (#2016) 2025-03-17 22:31:36 +08:00
Fangjun Kuang
0aacf02dd8 Add C++ runtime for vocos (#2014) 2025-03-17 17:05:15 +08:00
Fangjun Kuang
71824992a7 Add Java API for speech enhancement GTCRN models (#2009) 2025-03-16 15:13:20 +08:00
Fangjun Kuang
c5dbf1177c Add C API for speech enhancement GTCRN models (#1984) 2025-03-11 15:50:04 +08:00
Fangjun Kuang
5d2d792b1d Add Python API for speech enhancement GTCRN models (#1978) 2025-03-10 19:02:17 +08:00
Fangjun Kuang
488a6e687c Add C++ runtime for speech enhancement GTCRN models (#1977)
See also https://github.com/Xiaobin-Rong/gtcrn
2025-03-10 18:11:16 +08:00
cjsdurj
b87fce9a7f c-api add wave write to buffer. (#1962)
Co-authored-by: jian.chen03 <jian.chen03@transwarp.io>
2025-03-10 17:21:23 +08:00
Fangjun Kuang
362ddf2c07 Add C++ demo for VAD+non-streaming ASR (#1964) 2025-03-07 11:49:46 +08:00
Karel Vesely
7740dbfb96 Ebranchformer (#1951)
* adding ebranchformer encoder

* extend surfaced FeatureExtractorConfig

- so ebranchformer feature extraction can be configured from Python
- the GlobCmvn is not needed, as it is a module in the OnnxEncoder

* clean the code

* Integrating remarks from Fangjun
2025-03-04 19:41:09 +08:00
Fangjun Kuang
209eaaae1d Limit number of tokens per second for whisper. (#1958)
Otherwise, it spends lots of time in the loop if the EOT token
is not predicted.
2025-03-04 15:45:28 +08:00
Fangjun Kuang
c9d6859df7 Add transducer modified_beam_search for RKNN. (#1949) 2025-03-03 13:15:25 +08:00
Fangjun Kuang
d5e7b51af5 Support RKNN for Zipformer CTC models. (#1948) 2025-03-02 21:40:13 +08:00
Fangjun Kuang
dfcbc8d40b Add Kokoro v1.1-zh (#1942) 2025-02-28 15:47:59 +08:00