This repository has been archived on 2025-08-26 . You can view files and clone it, but cannot push or open issues or pull requests.
803c02db0aea8310c3e2e43cfbecf0a0680ebcf4
pypi.org provides only 10GB of free space for open-source projects. Each new release of sherpa-onnx occupies about 800MB, so we have to delete previous releases otherwise pypi.org refuses to accept new releases due to limited spaces. To let users install previous versions, we also publish wheels to huggingface and users can find them at https://k2-fsa.github.io/sherpa/onnx/cpu.html and https://k2-fsa.github.io/sherpa/onnx/cpu-cn.html (for users without access to huggingface.co)
Supported functions
| Speech recognition | Speech synthesis | Speaker verification | Speaker identification |
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ |
| Spoken Language identification | Audio tagging | Voice activity detection | Keyword spotting |
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ |
Supported platforms
| Architecture | Android | iOS | Windows | macOS | linux |
|---|---|---|---|---|---|
| x64 | ✔️ | ✔️ | ✔️ | ✔️ | |
| x86 | ✔️ | ✔️ | |||
| arm64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
| arm32 | ✔️ | ✔️ | |||
| riscv64 | ✔️ |
Supported programming languages
| C++ | C | Python | C# | Java | JavaScript | Kotlin | Swift | Go | Dart |
|---|---|---|---|---|---|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
It also supports WebAssembly.
Introduction
This repository supports running the following functions locally
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., silero-vad)
- Keyword spotting
on the following platforms and operating systems:
- x86,
x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64) - Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- NodeJS
- WebAssembly
- Raspberry Pi
- RV1126
- LicheePi4A
- VisionFive 2
- 旭日X3派
- etc
with the following APIs
- C++, C, Python, Go,
C# - Java, Kotlin, JavaScript
- Swift
- Dart
Links for pre-built Android APKs
| Description | URL | 中国用户 |
|---|---|---|
| Streaming speech recognition | Address | 点此 |
| Text-to-speech | Address | 点此 |
| Voice activity detection (VAD) | Address | 点此 |
| VAD + non-streaming speech recognition | Address | 点此 |
| Two-pass speech recognition | Address | 点此 |
| Audio tagging | Address | 点此 |
| Audio tagging (WearOS) | Address | 点此 |
| Speaker identification | Address | 点此 |
| Spoken language identification | Address | 点此 |
| Keyword spotting | Address | 点此 |
Links for pre-built Flutter APPs
Real-time speech recognition
| Description | URL | 中国用户 |
|---|---|---|
| Streaming speech recognition | Address | 点此 |
Text-to-speech
| Description | URL | 中国用户 |
|---|---|---|
| Android (arm64-v8a, armeabi-v7a, x86_64) | Address | 点此 |
| Linux (x64) | Address | 点此 |
| macOS (x64) | Address | 点此 |
| macOS (arm64) | Address | 点此 |
| Windows (x64) | Address | 点此 |
Note: You need to build from source for iOS.
Links for pre-trained models
| Description | URL |
|---|---|
| Speech recognition (speech to text, ASR) | Address |
| Text-to-speech (TTS) | Address |
| VAD | Address |
| Keyword spotting | Address |
| Audio tagging | Address |
| Speaker identification (Speaker ID) | Address |
| Spoken language identification (Language ID) | See multi-lingual Whisper ASR models from Speech recognition |
| Punctuation | Address |
Useful links
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
How to reach us
Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.
Description
Languages
C++
38.3%
Python
16.3%
Shell
7.6%
Kotlin
5.1%
JavaScript
5.1%
Other
27.4%