Commit Graph

641 Commits

Author SHA1 Message Date
Fangjun Kuang
0d44df9b67 Release v1.12.5 (#2368) 2025-07-10 15:31:26 +08:00
Fangjun Kuang
fd9a687ec2 Add Pascal/Go/C#/Dart API for NeMo Canary ASR models (#2367)
Add support for the new NeMo Canary ASR model across multiple language bindings by introducing a Canary model configuration and setter method on the offline recognizer.

- Define Canary model config in Pascal, Go, C#, Dart and update converter functions
- Add SetConfig API for offline recognizer (Pascal, Go, C#, Dart)
- Extend CI/workflows and example scripts to test non-streaming Canary decoding
2025-07-10 14:53:33 +08:00
Askars Salimbajevs
f0960342ad Add LODR support to online and offline recognizers (#2026)
This PR integrates LODR (Level-Ordered Deterministic Rescoring) support from Icefall into both online and offline recognizers, enabling LODR for LM shallow fusion and LM rescore.

- Extended OnlineLMConfig and OfflineLMConfig to include lodr_fst, lodr_scale, and lodr_backoff_id.
- Implemented LodrFst and LodrStateCost classes and wired them into RNN LM scoring in both online and offline code paths.
- Updated Python bindings, CLI entry points, examples, and CI test scripts to accept and exercise the new LODR options.
2025-07-09 16:23:46 +08:00
Fangjun Kuang
6122a678f5 Refactor exporting NeMo models (#2362)
Refactors and extends model export support to include new NeMo Parakeet TDT int8 variants for English and Japanese, updating the Kotlin API, export scripts, test runners, and CI workflows.

- Added support for two new int8 model types in OfflineRecognizer.kt.
- Enhanced Python export scripts to perform dynamic quantization and metadata injection.
- Updated shell scripts and GitHub workflows to package, test, and publish int8 model artifacts.
2025-07-09 16:02:12 +08:00
Fangjun Kuang
103e93d9f6 Add Java and Kotlin API for NeMo Canary models (#2359)
Add support for the NeMo Canary model in both Java and Kotlin APIs, wiring it through
JNI and updating examples and CI.

- Introduce OfflineCanaryModelConfig in Kotlin and Java with builder patterns
- Extend OfflineRecognizer to accept and apply the new canary config via setConfig
- Update JNI binding (GetOfflineConfig) and getOfflineModelConfig mapping (type 32), 
   plus examples and CI workflows
2025-07-08 13:45:26 +08:00
Fangjun Kuang
df4615ca1d Add C/CXX/JavaScript API for NeMo Canary models (#2357)
This PR introduces support for NeMo Canary models across C, C++, and JavaScript APIs 
by adding new Canary configuration structures, updating bindings, extending examples,
and enhancing CI workflows.

- Add OfflineCanaryModelConfig to all language bindings (C, C++, JS, ETS).
- Implement SetConfig methods and NAPI wrappers for updating recognizer config at runtime.
- Update examples and CI scripts to demonstrate and test NeMo Canary model usage.
2025-07-07 23:38:04 +08:00
Fangjun Kuang
0e738c356c Add C++ runtime and Python API for NeMo Canary models (#2352) 2025-07-07 17:03:49 +08:00
Fangjun Kuang
c1e9e5c87f Fix TTS for Unreal Engine (#2349)
Unreal Engine has its own memory management, so we cannot return a struct containing a std::vector object.
2025-07-06 19:20:26 +08:00
Fangjun Kuang
e6b388067d Release v1.12.4 (#2343) 2025-07-04 19:41:02 +08:00
Fangjun Kuang
3bf986d08d Support non-streaming zipformer CTC ASR models (#2340)
This PR adds support for non-streaming Zipformer CTC ASR models across 
multiple language bindings, WebAssembly, examples, and CI workflows.

- Introduces a new OfflineZipformerCtcModelConfig in C/C++, Python, Swift, Java, Kotlin, Go, Dart, Pascal, and C# APIs
- Updates initialization, freeing, and recognition logic to include Zipformer CTC in WASM and Node.js
- Adds example scripts and CI steps for downloading, building, and running Zipformer CTC models

Model doc is available at
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-ctc/icefall/zipformer.html
2025-07-04 15:57:07 +08:00
wenjie.Li
ef16455cb5 Add sherpa-onnx-streaming-zipformer-zh-int8-2025-06-30 to android ASR apk (#2336) 2025-07-03 11:31:13 +08:00
Fangjun Kuang
9fe25cc06f Fix VAD+ASR C++ example. (#2335)
It was not able to handle short audios., e.g., 2.1 seconds.
2025-07-02 15:52:49 +08:00
Fangjun Kuang
e25634ac39 Release v1.12.3 (#2322) 2025-06-27 10:55:46 +08:00
Fangjun Kuang
f835642b1c Support Zipformer transducer ASR with whisper features. (#2321)
Adds support for Zipformer transducer ASR models that use Whisper-style 
features by introducing a new feature flag, parsing metadata, 
and integrating per-chunk normalization.

- Introduce UseWhisperFeature in the model interface and Zipformer implementation
- Parse "feature" metadata to set the whisper flag and wire it into the recognizer
- Update feature extraction logic to handle Whisper filterbanks with early returns
2025-06-27 10:40:41 +08:00
Fangjun Kuang
54bf3732d9 Support zipformer CTC ASR with whisper features. (#2319) 2025-06-27 00:15:11 +08:00
Fangjun Kuang
056da0528d Release v1.12.2 (#2314) 2025-06-25 00:37:55 +08:00
Fangjun Kuang
bda427f4b2 Add API to get version information (#2309) 2025-06-25 00:22:21 +08:00
Fangjun Kuang
6982b86c66 Support extra languages in multi-lang kokoro tts (#2303) 2025-06-20 11:22:52 +08:00
Fangjun Kuang
a6095f5f64 Fix building for Pascal (#2305) 2025-06-20 11:10:07 +08:00
Fangjun Kuang
59d118c256 Refactor kokoro export (#2302)
- generate samples for https://k2-fsa.github.io/sherpa/onnx/tts/all/
- provide int8 model for kokoro v0.19 kokoro-int8-en-v0_19.tar.bz2
2025-06-18 20:30:10 +08:00
Fangjun Kuang
3878170991 Fixes #2172 (#2301)
Handle the case when the input audio contains no speeches.
2025-06-18 16:48:48 +08:00
Fangjun Kuang
2913cce77c Add scripts for exporting Piper TTS models to sherpa-onnx (#2299) 2025-06-17 14:23:39 +08:00
GlocKieHuan
a135324c8c Fix isspace on windows in debug build (#2042) 2025-06-09 10:27:16 +08:00
Fangjun Kuang
d57e4f84de Add Python API for source separation (#2283) 2025-06-05 20:44:26 +08:00
Fangjun Kuang
1fabc6c79a Fix rknn for multi-threads (#2274) 2025-06-03 20:28:57 +08:00
Fangjun Kuang
2b2788332e Add C++ support for UVR models (#2269) 2025-06-01 17:22:08 +08:00
mtdxc
e0ca224b76 fixed mfc build error (#2267)
Co-authored-by: cqm <cqm@97kid.com>
2025-05-31 23:32:35 +08:00
mtdxc
613e8084c2 move portaudio common record code to microphone (#2264)
Co-authored-by: cqm <cqm@97kid.com>
2025-05-31 21:48:41 +08:00
Fangjun Kuang
8e6826521e Update kaldi-native-fbank. (#2259)
Now it supports FFT of an even number, not necessarily a power of 2.
2025-05-29 10:34:22 +08:00
Fangjun Kuang
16a3449945 Build APK with replace.fst (#2254) 2025-05-28 12:19:29 +08:00
Skepller
640ceb5513 JAVA-API: Manual Library Loading Support for Restricted Environments (#2253)
* feat: Added LibraryLoader that allows loading to be skipped

* feat: Changed static call to new LibraryLoader

* feat: Makefile adjustment
2025-05-28 06:13:39 +08:00
yegyu
2107afdbd4 Add include headers for __ANDROID_API__,__OHOS__ (#2251) 2025-05-27 14:44:06 +08:00
Fangjun Kuang
716ba8317b Add C++ runtime for spleeter about source separation (#2242) 2025-05-23 22:30:57 +08:00
Fangjun Kuang
ff6f3b17ac Use jlong explicitly in jni. (#2229) 2025-05-20 15:29:47 +08:00
Fangjun Kuang
d8bb20710d Add script to build APK for simulated-streaming-asr. (#2220) 2025-05-15 15:40:22 +08:00
esavin
aeb311db50 Expose dither for JNI (#2215) 2025-05-14 23:38:25 +08:00
Fangjun Kuang
2e9e0b4e9e Add Android demo for real-time ASR with non-streaming ASR models. (#2214) 2025-05-14 19:10:44 +08:00
Fangjun Kuang
0dfafed7d0 Support homophone replacer in Android asr demo. (#2210) 2025-05-14 10:58:35 +08:00
Fangjun Kuang
9a0e16f092 Support sending is_eof for online websocket server. (#2204)
is_final=true means an endpoint is detected.

is_eof=true means all received samples have been processed
by the server.
2025-05-13 14:49:22 +08:00
Fangjun Kuang
028b8f2718 Add C++ example for streaming ASR with SenseVoice. (#2199) 2025-05-11 00:23:32 +08:00
Fangjun Kuang
53518efd2f Add real-time speech recognition example for SenseVoice. (#2197) 2025-05-10 00:50:40 +08:00
Fangjun Kuang
4a833a7547 Fix displaying streaming speech recognition results for Python. (#2196) 2025-05-09 21:48:49 +08:00
Fangjun Kuang
a6834f6556 Show verbose logs in homophone replacer (#2194) 2025-05-09 10:48:30 +08:00
Fangjun Kuang
562a5f7d9b Fix building wheels for macOS (#2192) 2025-05-08 19:15:33 +08:00
Fangjun Kuang
f9c99032c3 Avoid NaN in feature normalization. (#2186) 2025-05-08 11:22:47 +08:00
Fangjun Kuang
f00066db88 Add C++ runtime for parakeet-tdt-0.6b-v2. (#2181) 2025-05-06 16:59:01 +08:00
Fangjun Kuang
e537094b07 Add Kotlin and Java API for homophone replacer (#2166)
* Add Kotlin API for homonphone replacer

* Add Java API for homonphone replacer
2025-04-29 22:55:21 +08:00
Fangjun Kuang
4a7a974a04 More fix for building without tts (#2162) 2025-04-29 16:31:31 +08:00
Fangjun Kuang
e51c37eb2f Add C and CXX API for homophone replacer (#2156) 2025-04-27 22:09:13 +08:00
Fangjun Kuang
f64c58342b Support replacing homonphonic phrases (#2153) 2025-04-27 15:31:11 +08:00