Commit Graph

1275 Commits

Author SHA1 Message Date
7c26922e30 update README 2025-08-07 09:53:14 +08:00
1fb85833fc add whl file 2025-08-07 09:49:15 +08:00
4828ec9ef0 update REAMDE 2025-08-06 11:55:51 +08:00
37ad87e75c code change for mr_v100 2025-08-06 11:35:49 +08:00
Fangjun Kuang
0d44df9b67 Release v1.12.5 (#2368) 2025-07-10 15:31:26 +08:00
Fangjun Kuang
fd9a687ec2 Add Pascal/Go/C#/Dart API for NeMo Canary ASR models (#2367)
Add support for the new NeMo Canary ASR model across multiple language bindings by introducing a Canary model configuration and setter method on the offline recognizer.

- Define Canary model config in Pascal, Go, C#, Dart and update converter functions
- Add SetConfig API for offline recognizer (Pascal, Go, C#, Dart)
- Extend CI/workflows and example scripts to test non-streaming Canary decoding
2025-07-10 14:53:33 +08:00
Fangjun Kuang
e2b2d5ea57 Add CXX examples for NeMo TDT ASR. (#2363)
# New Features
- Added new example programs demonstrating streaming speech recognition from a microphone using Parakeet-TDT CTC and Zipformer Transducer models with voice activity detection.
- These examples support microphone input via PortAudio and display recognized text incrementally.

# Bug Fixes
- Improved error handling and logic when opening microphone devices in several example programs for more reliable device initialization.

# Chores
- Updated build configuration to include new executable examples when PortAudio support is enabled.
2025-07-09 18:30:42 +08:00
Askars Salimbajevs
f0960342ad Add LODR support to online and offline recognizers (#2026)
This PR integrates LODR (Level-Ordered Deterministic Rescoring) support from Icefall into both online and offline recognizers, enabling LODR for LM shallow fusion and LM rescore.

- Extended OnlineLMConfig and OfflineLMConfig to include lodr_fst, lodr_scale, and lodr_backoff_id.
- Implemented LodrFst and LodrStateCost classes and wired them into RNN LM scoring in both online and offline code paths.
- Updated Python bindings, CLI entry points, examples, and CI test scripts to accept and exercise the new LODR options.
2025-07-09 16:23:46 +08:00
Fangjun Kuang
6122a678f5 Refactor exporting NeMo models (#2362)
Refactors and extends model export support to include new NeMo Parakeet TDT int8 variants for English and Japanese, updating the Kotlin API, export scripts, test runners, and CI workflows.

- Added support for two new int8 model types in OfflineRecognizer.kt.
- Enhanced Python export scripts to perform dynamic quantization and metadata injection.
- Updated shell scripts and GitHub workflows to package, test, and publish int8 model artifacts.
2025-07-09 16:02:12 +08:00
Fangjun Kuang
f1405779cf Fix nemo feature normalization in test code (#2361) 2025-07-08 15:41:56 +08:00
Fangjun Kuang
831aff187d Upload fp16 onnx model files for FireRedASR (#2360) 2025-07-08 13:46:03 +08:00
Fangjun Kuang
103e93d9f6 Add Java and Kotlin API for NeMo Canary models (#2359)
Add support for the NeMo Canary model in both Java and Kotlin APIs, wiring it through
JNI and updating examples and CI.

- Introduce OfflineCanaryModelConfig in Kotlin and Java with builder patterns
- Extend OfflineRecognizer to accept and apply the new canary config via setConfig
- Update JNI binding (GetOfflineConfig) and getOfflineModelConfig mapping (type 32), 
   plus examples and CI workflows
2025-07-08 13:45:26 +08:00
Fangjun Kuang
df4615ca1d Add C/CXX/JavaScript API for NeMo Canary models (#2357)
This PR introduces support for NeMo Canary models across C, C++, and JavaScript APIs 
by adding new Canary configuration structures, updating bindings, extending examples,
and enhancing CI workflows.

- Add OfflineCanaryModelConfig to all language bindings (C, C++, JS, ETS).
- Implement SetConfig methods and NAPI wrappers for updating recognizer config at runtime.
- Update examples and CI scripts to demonstrate and test NeMo Canary model usage.
2025-07-07 23:38:04 +08:00
Fangjun Kuang
0e738c356c Add C++ runtime and Python API for NeMo Canary models (#2352) 2025-07-07 17:03:49 +08:00
Fangjun Kuang
f8d957a24b Update README to include https://github.com/bbeyondllove/asr_server (#2353) 2025-07-07 10:17:20 +08:00
Fangjun Kuang
fce481c125 Add meta data to NeMo canary ONNX models (#2351) 2025-07-07 00:12:20 +08:00
Fangjun Kuang
25f9cec072 Update readme to include https://github.com/mawwalker/stt-server (#2350) 2025-07-07 00:02:09 +08:00
Fangjun Kuang
c1e9e5c87f Fix TTS for Unreal Engine (#2349)
Unreal Engine has its own memory management, so we cannot return a struct containing a std::vector object.
2025-07-06 19:20:26 +08:00
lucaelin
5ebb71909b fix(canary): use dynamo export, single input_ids and avoid 0/1 specialization (#2348) 2025-07-06 18:24:06 +08:00
Fangjun Kuang
d70b789582 Fix testing dart packages (#2345) 2025-07-04 22:27:24 +08:00
linsui
33a689dc86 Fix typo CMAKE_EXECUTBLE_LINKER_FLAGS -> CMAKE_EXECUTABLE_LINKER_FLAGS (#2344) 2025-07-04 21:13:39 +08:00
Fangjun Kuang
e6b388067d Release v1.12.4 (#2343) 2025-07-04 19:41:02 +08:00
Fangjun Kuang
53a3ad366b Support linux aarch64 for Dart and Flutter (#2342)
Adds support for building and packaging Linux AArch64 (arm64) artifacts alongside x64 for Dart/Flutter plugins.

- Detects host architecture in CMake and adjusts library paths
- Extends test workflows to run on an ARM runner and handle linux-aarch64 paths
- Splits release pipeline into separate x64 and aarch64 build/package jobs
2025-07-04 19:33:48 +08:00
Fangjun Kuang
3bf986d08d Support non-streaming zipformer CTC ASR models (#2340)
This PR adds support for non-streaming Zipformer CTC ASR models across 
multiple language bindings, WebAssembly, examples, and CI workflows.

- Introduces a new OfflineZipformerCtcModelConfig in C/C++, Python, Swift, Java, Kotlin, Go, Dart, Pascal, and C# APIs
- Updates initialization, freeing, and recognition logic to include Zipformer CTC in WASM and Node.js
- Adds example scripts and CI steps for downloading, building, and running Zipformer CTC models

Model doc is available at
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-ctc/icefall/zipformer.html
2025-07-04 15:57:07 +08:00
wenjie.Li
ef16455cb5 Add sherpa-onnx-streaming-zipformer-zh-int8-2025-06-30 to android ASR apk (#2336) 2025-07-03 11:31:13 +08:00
Fangjun Kuang
9fe25cc06f Fix VAD+ASR C++ example. (#2335)
It was not able to handle short audios., e.g., 2.1 seconds.
2025-07-02 15:52:49 +08:00
Fangjun Kuang
ea3e583ac9 Fix static link without tts (#2328) 2025-06-30 14:21:01 +08:00
Fangjun Kuang
046ce01203 Add TTS engline APKs for more models (#2327) 2025-06-30 13:36:29 +08:00
Fangjun Kuang
f725cb3306 Refactor release scripts. (#2323)
It refactors the release scripts to centralize and simplify version updates across 
multiple files. Key changes include:

- Introducing variables (old_version, new_version, replace_str) for version substitution.
- Replacing hard-coded sed expressions with dynamic ones in various files.
- Ensuring backup files generated by sed are cleaned up after execution.
2025-06-27 11:22:31 +08:00
Fangjun Kuang
e25634ac39 Release v1.12.3 (#2322) 2025-06-27 10:55:46 +08:00
Fangjun Kuang
f835642b1c Support Zipformer transducer ASR with whisper features. (#2321)
Adds support for Zipformer transducer ASR models that use Whisper-style 
features by introducing a new feature flag, parsing metadata, 
and integrating per-chunk normalization.

- Introduce UseWhisperFeature in the model interface and Zipformer implementation
- Parse "feature" metadata to set the whisper flag and wire it into the recognizer
- Update feature extraction logic to handle Whisper filterbanks with early returns
2025-06-27 10:40:41 +08:00
Fangjun Kuang
54bf3732d9 Support zipformer CTC ASR with whisper features. (#2319) 2025-06-27 00:15:11 +08:00
Fangjun Kuang
282211c01f Remove portaudio-go in Go API examples. (#2317)
Replace the deprecated portaudio-go integration with malgo in the Go real-time 
speech recognition example and correct version string typos in the Node.js examples.

- Fixed “verison” typo in Node.js console logs.
- Swapped out portaudio-go for malgo in the Go microphone example, 
   introducing initRecognizer, callback-driven streaming, and sample conversion.
- Removed portaudio-go from go.mod.
2025-06-26 11:33:50 +08:00
Fangjun Kuang
074236ae80 Show cmake debug information. (#2316) 2025-06-25 17:44:51 +08:00
Fangjun Kuang
056da0528d Release v1.12.2 (#2314) 2025-06-25 00:37:55 +08:00
Fangjun Kuang
bda427f4b2 Add API to get version information (#2309) 2025-06-25 00:22:21 +08:00
Fangjun Kuang
7f2145539d Update readme to include BreezeApp from MediaTek Research. (#2313)
See also https://github.com/mtkresearch/BreezeApp
2025-06-24 18:05:50 +08:00
Fangjun Kuang
6982b86c66 Support extra languages in multi-lang kokoro tts (#2303) 2025-06-20 11:22:52 +08:00
Fangjun Kuang
a6095f5f64 Fix building for Pascal (#2305) 2025-06-20 11:10:07 +08:00
Fangjun Kuang
59d118c256 Refactor kokoro export (#2302)
- generate samples for https://k2-fsa.github.io/sherpa/onnx/tts/all/
- provide int8 model for kokoro v0.19 kokoro-int8-en-v0_19.tar.bz2
2025-06-18 20:30:10 +08:00
Fangjun Kuang
3878170991 Fixes #2172 (#2301)
Handle the case when the input audio contains no speeches.
2025-06-18 16:48:48 +08:00
DSOE1024
b4716e29a6 Update sherpa-onnx-shared.pc.in (#2300)
Fix linking with C++ examples.
2025-06-17 16:55:42 +08:00
Fangjun Kuang
2913cce77c Add scripts for exporting Piper TTS models to sherpa-onnx (#2299) 2025-06-17 14:23:39 +08:00
Fangjun Kuang
4ae9382bae Update TTS Engine APK to support multi-lang (#2294) 2025-06-17 14:16:48 +08:00
guoxiangyang
0c42c06f75 update wasm/vad-asr/assets/README.md for more clear (#2297)
Co-authored-by: gxy <gxy@conwi.cn>
2025-06-16 15:35:20 +08:00
GlocKieHuan
a135324c8c Fix isspace on windows in debug build (#2042) 2025-06-09 10:27:16 +08:00
Fangjun Kuang
e13f7dbdd2 Add link to huggingface space for source separation. (#2284) 2025-06-06 13:38:16 +08:00
Fangjun Kuang
d57e4f84de Add Python API for source separation (#2283) 2025-06-05 20:44:26 +08:00
Fangjun Kuang
6f0fac2064 Add jar for Java 24. (#2280) 2025-06-04 11:08:45 +08:00
Fangjun Kuang
db632dacf3 Fix CI for windows (#2279) 2025-06-04 10:35:48 +08:00