Add support for the new NeMo Canary ASR model across multiple language bindings by introducing a Canary model configuration and setter method on the offline recognizer.
- Define Canary model config in Pascal, Go, C#, Dart and update converter functions
- Add SetConfig API for offline recognizer (Pascal, Go, C#, Dart)
- Extend CI/workflows and example scripts to test non-streaming Canary decoding
# New Features
- Added new example programs demonstrating streaming speech recognition from a microphone using Parakeet-TDT CTC and Zipformer Transducer models with voice activity detection.
- These examples support microphone input via PortAudio and display recognized text incrementally.
# Bug Fixes
- Improved error handling and logic when opening microphone devices in several example programs for more reliable device initialization.
# Chores
- Updated build configuration to include new executable examples when PortAudio support is enabled.
This PR integrates LODR (Level-Ordered Deterministic Rescoring) support from Icefall into both online and offline recognizers, enabling LODR for LM shallow fusion and LM rescore.
- Extended OnlineLMConfig and OfflineLMConfig to include lodr_fst, lodr_scale, and lodr_backoff_id.
- Implemented LodrFst and LodrStateCost classes and wired them into RNN LM scoring in both online and offline code paths.
- Updated Python bindings, CLI entry points, examples, and CI test scripts to accept and exercise the new LODR options.
Refactors and extends model export support to include new NeMo Parakeet TDT int8 variants for English and Japanese, updating the Kotlin API, export scripts, test runners, and CI workflows.
- Added support for two new int8 model types in OfflineRecognizer.kt.
- Enhanced Python export scripts to perform dynamic quantization and metadata injection.
- Updated shell scripts and GitHub workflows to package, test, and publish int8 model artifacts.
Add support for the NeMo Canary model in both Java and Kotlin APIs, wiring it through
JNI and updating examples and CI.
- Introduce OfflineCanaryModelConfig in Kotlin and Java with builder patterns
- Extend OfflineRecognizer to accept and apply the new canary config via setConfig
- Update JNI binding (GetOfflineConfig) and getOfflineModelConfig mapping (type 32),
plus examples and CI workflows
This PR introduces support for NeMo Canary models across C, C++, and JavaScript APIs
by adding new Canary configuration structures, updating bindings, extending examples,
and enhancing CI workflows.
- Add OfflineCanaryModelConfig to all language bindings (C, C++, JS, ETS).
- Implement SetConfig methods and NAPI wrappers for updating recognizer config at runtime.
- Update examples and CI scripts to demonstrate and test NeMo Canary model usage.
Adds support for building and packaging Linux AArch64 (arm64) artifacts alongside x64 for Dart/Flutter plugins.
- Detects host architecture in CMake and adjusts library paths
- Extends test workflows to run on an ARM runner and handle linux-aarch64 paths
- Splits release pipeline into separate x64 and aarch64 build/package jobs
This PR adds support for non-streaming Zipformer CTC ASR models across
multiple language bindings, WebAssembly, examples, and CI workflows.
- Introduces a new OfflineZipformerCtcModelConfig in C/C++, Python, Swift, Java, Kotlin, Go, Dart, Pascal, and C# APIs
- Updates initialization, freeing, and recognition logic to include Zipformer CTC in WASM and Node.js
- Adds example scripts and CI steps for downloading, building, and running Zipformer CTC models
Model doc is available at
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-ctc/icefall/zipformer.html
It refactors the release scripts to centralize and simplify version updates across
multiple files. Key changes include:
- Introducing variables (old_version, new_version, replace_str) for version substitution.
- Replacing hard-coded sed expressions with dynamic ones in various files.
- Ensuring backup files generated by sed are cleaned up after execution.
Adds support for Zipformer transducer ASR models that use Whisper-style
features by introducing a new feature flag, parsing metadata,
and integrating per-chunk normalization.
- Introduce UseWhisperFeature in the model interface and Zipformer implementation
- Parse "feature" metadata to set the whisper flag and wire it into the recognizer
- Update feature extraction logic to handle Whisper filterbanks with early returns
Replace the deprecated portaudio-go integration with malgo in the Go real-time
speech recognition example and correct version string typos in the Node.js examples.
- Fixed “verison” typo in Node.js console logs.
- Swapped out portaudio-go for malgo in the Go microphone example,
introducing initRecognizer, callback-driven streaming, and sample conversion.
- Removed portaudio-go from go.mod.