Commit Graph

55 Commits

Author SHA1 Message Date
Fangjun Kuang
6122a678f5 Refactor exporting NeMo models (#2362)
Refactors and extends model export support to include new NeMo Parakeet TDT int8 variants for English and Japanese, updating the Kotlin API, export scripts, test runners, and CI workflows.

- Added support for two new int8 model types in OfflineRecognizer.kt.
- Enhanced Python export scripts to perform dynamic quantization and metadata injection.
- Updated shell scripts and GitHub workflows to package, test, and publish int8 model artifacts.
2025-07-09 16:02:12 +08:00
Fangjun Kuang
103e93d9f6 Add Java and Kotlin API for NeMo Canary models (#2359)
Add support for the NeMo Canary model in both Java and Kotlin APIs, wiring it through
JNI and updating examples and CI.

- Introduce OfflineCanaryModelConfig in Kotlin and Java with builder patterns
- Extend OfflineRecognizer to accept and apply the new canary config via setConfig
- Update JNI binding (GetOfflineConfig) and getOfflineModelConfig mapping (type 32), 
   plus examples and CI workflows
2025-07-08 13:45:26 +08:00
Fangjun Kuang
3bf986d08d Support non-streaming zipformer CTC ASR models (#2340)
This PR adds support for non-streaming Zipformer CTC ASR models across 
multiple language bindings, WebAssembly, examples, and CI workflows.

- Introduces a new OfflineZipformerCtcModelConfig in C/C++, Python, Swift, Java, Kotlin, Go, Dart, Pascal, and C# APIs
- Updates initialization, freeing, and recognition logic to include Zipformer CTC in WASM and Node.js
- Adds example scripts and CI steps for downloading, building, and running Zipformer CTC models

Model doc is available at
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-ctc/icefall/zipformer.html
2025-07-04 15:57:07 +08:00
wenjie.Li
ef16455cb5 Add sherpa-onnx-streaming-zipformer-zh-int8-2025-06-30 to android ASR apk (#2336) 2025-07-03 11:31:13 +08:00
Fangjun Kuang
9fe25cc06f Fix VAD+ASR C++ example. (#2335)
It was not able to handle short audios., e.g., 2.1 seconds.
2025-07-02 15:52:49 +08:00
Fangjun Kuang
bda427f4b2 Add API to get version information (#2309) 2025-06-25 00:22:21 +08:00
Fangjun Kuang
6982b86c66 Support extra languages in multi-lang kokoro tts (#2303) 2025-06-20 11:22:52 +08:00
Fangjun Kuang
d8bb20710d Add script to build APK for simulated-streaming-asr. (#2220) 2025-05-15 15:40:22 +08:00
esavin
aeb311db50 Expose dither for JNI (#2215) 2025-05-14 23:38:25 +08:00
Fangjun Kuang
e537094b07 Add Kotlin and Java API for homophone replacer (#2166)
* Add Kotlin API for homonphone replacer

* Add Java API for homonphone replacer
2025-04-29 22:55:21 +08:00
Fangjun Kuang
7cbb1bc433 Upload more onnx ASR models (#2141) 2025-04-21 18:57:41 +08:00
Fangjun Kuang
e3bce847c0 Support running sherpa-onnx with RK NPU on Android (#2124) 2025-04-15 16:42:28 +08:00
Fangjun Kuang
eee5575836 Add Kotlin and Java API for Dolphin CTC models (#2086) 2025-04-02 21:16:14 +08:00
Fangjun Kuang
ed8e6c9aed Add Kotlin API for speech enhancement GTCRN models (#2008) 2025-03-16 10:41:01 +08:00
Fangjun Kuang
f5dfcf8d2f Add Kotlin and Java API for online punctuation models (#1936) 2025-02-27 16:52:36 +08:00
Fangjun Kuang
d148860d2c Add Kotlin and Java API for FireRedAsr AED model (#1870) 2025-02-17 10:50:25 +08:00
Fangjun Kuang
69f489f0cd Support scaling the duration of a pause in TTS. (#1820) 2025-02-08 12:47:26 +08:00
Fangjun Kuang
a52b819fb5 Add Android demo for Kokoro TTS 1.0 (#1799) 2025-02-07 13:07:30 +08:00
Fangjun Kuang
4372a7a7b0 Add Java and Koltin API for Kokoro TTS 1.0 (#1798) 2025-02-07 09:59:27 +08:00
Fangjun Kuang
8b989a851c Fix keyword spotting. (#1689)
Reset the stream right after detecting a keyword
2025-01-20 16:41:10 +08:00
Fangjun Kuang
99cef4198b Add Koltin and Java API for Kokoro TTS models (#1728) 2025-01-17 17:36:13 +08:00
Fangjun Kuang
1fe5fe495f Add Android demo for MatchaTTS models. (#1683) 2025-01-06 06:44:09 +08:00
Fangjun Kuang
3422b9388d Add Kotlin API for Matcha-TTS models. (#1668) 2024-12-31 19:20:52 +08:00
Fangjun Kuang
08d771337b Add a byte-level BPE Chinese+English non-streaming zipformer model (#1645) 2024-12-24 16:56:49 +08:00
Fangjun Kuang
be87f866f3 Use aar in Android Java demo. (#1616) 2024-12-12 18:26:54 +08:00
Fangjun Kuang
4dc4f1a708 Provide sherpa-onnx.aar for Android (#1615) 2024-12-12 16:59:00 +08:00
Fangjun Kuang
a3c89aa0d8 Add two-pass ASR Android APKs for Moonshine models. (#1499) 2024-10-31 17:54:16 +08:00
Fangjun Kuang
bd4b223920 Add Kotlin and Java API for Moonshine models (#1474) 2024-10-26 22:30:29 +08:00
Fangjun Kuang
707cf792c5 Add GigaAM NeMo transducer model for Russian ASR (#1467) 2024-10-25 15:20:13 +08:00
Fangjun Kuang
b41f6d2c94 Support GigaAM CTC models for Russian ASR (#1464)
See also https://github.com/salute-developers/GigaAM
2024-10-25 10:55:16 +08:00
Fangjun Kuang
5a22f74b2b Android demo for speaker diarization (#1423) 2024-10-13 14:02:57 +08:00
Fangjun Kuang
2d412b1190 Kotlin API for speaker diarization (#1415) 2024-10-11 14:41:53 +08:00
Fangjun Kuang
576a3aa90d Add non-streaming ONNX models for Russian ASR (#1358) 2024-09-18 13:43:49 +08:00
Fangjun Kuang
e7ffcbd677 Add APIs about max speech duration in VAD for various programming languages (#1349) 2024-09-14 12:30:13 +08:00
Robin Zhong
d8001d6edc update kotlin api for better release native object and add user-friendly apis. (#1275) 2024-08-22 19:18:11 +08:00
Robin Zhong
62c4d4ab62 Add emotion, event of SenseVoice. (#1257)
* Add emotion, event of SenseVoice.

* Fix tokens size check and update java api.

https://github.com/k2-fsa/sherpa-onnx/pull/1257
2024-08-14 15:50:13 +08:00
Fangjun Kuang
94e256244d Add blank penalty for various language bindings. (#1234) 2024-08-08 10:43:31 +08:00
Fangjun Kuang
35c1b4a7a9 Add ReazonSpeech Japanese pre-trained model (#1203) 2024-08-02 10:21:24 +08:00
Fangjun Kuang
dd300b1de5 Add Java and Kotlin API for sense voice (#1164) 2024-07-22 14:08:40 +08:00
Fangjun Kuang
fa07bbc176 Add APK for small paraformer (#1133) 2024-07-15 19:44:36 +08:00
Fangjun Kuang
b5093e27f9 Fix publishing apks to huggingface (#1121)
Save APKs for each release in a separate directory.

Huggingface requires that each directory cannot contain more than 1000 files.

Since we have so many tts models and for each model we need to build APKs of 4 different ABIs,
it is a workaround for the huggingface's constraint by placing them into separate directories for different releases.
2024-07-13 16:14:00 +08:00
Fangjun Kuang
dd0ff2ca06 Support onnxruntime 1.18.0 (#906) 2024-07-10 17:05:26 +08:00
Fangjun Kuang
c2cc9dec58 Add Flush to VAD so that the last segment can be detected. (#1099) 2024-07-09 16:15:56 +08:00
Fangjun Kuang
8c4f576f1b Support .Net framework 2.0 (#1062) 2024-06-28 11:27:19 +08:00
Fangjun Kuang
1f95bff719 Add non-streaming zipformer Android APK (#1052) 2024-06-24 16:22:19 +08:00
Fangjun Kuang
36336b31f4 Build Android APK for Thai (#1036) 2024-06-20 18:05:57 +08:00
Fangjun Kuang
6789c909d2 Inverse text normalization API of streaming ASR for various programming languages (#1022) 2024-06-18 13:42:17 +08:00
Fangjun Kuang
6e09933d99 Inverse text normalization API for other programming languages (#1019) 2024-06-17 17:02:39 +08:00
Fangjun Kuang
e1201225f2 Add Android APK for Korean (#1015) 2024-06-16 19:17:15 +08:00
Fangjun Kuang
fd5a0d1e00 Add C++ runtime for Tele-AI/TeleSpeech-ASR (#970) 2024-06-05 00:26:40 +08:00