This repository has been archived on 2025-08-26 . You can view files and clone it, but cannot push or open issues or pull requests.
f93f0ca94d08737f5721437efe181aed0bd2cf9a
So that the main thread is not blocked and the user interface is responsive.
Supported functions
| Speech recognition | Speech synthesis | Speaker verification | Speaker identification |
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ |
| Spoken Language identification | Audio tagging | Voice activity detection |
|---|---|---|
| ✔️ | ✔️ | ✔️ |
| Keyword spotting | Add punctuation |
|---|---|
| ✔️ | ✔️ |
Supported platforms
| Architecture | Android | iOS | Windows | macOS | linux |
|---|---|---|---|---|---|
| x64 | ✔️ | ✔️ | ✔️ | ✔️ | |
| x86 | ✔️ | ✔️ | |||
| arm64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
| arm32 | ✔️ | ✔️ | |||
| riscv64 | ✔️ |
Supported programming languages
| 1. C++ | 2. C | 3. Python | 4. JavaScript |
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ |
| 5. Java | 6. C# | 7. Kotlin | 8. Swift |
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ |
| 9. Go | 10. Dart | 11. Rust | 12. Pascal |
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ |
For Rust support, please see https://github.com/thewh1teagle/sherpa-rs
It also supports WebAssembly.
Introduction
This repository supports running the following functions locally
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., silero-vad)
- Keyword spotting
on the following platforms and operating systems:
- x86,
x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64) - Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- NodeJS
- WebAssembly
- Raspberry Pi
- RV1126
- LicheePi4A
- VisionFive 2
- 旭日X3派
- etc
with the following APIs
- C++, C, Python, Go,
C# - Java, Kotlin, JavaScript
- Swift, Rust
- Dart, Object Pascal
Links for pre-built Android APKs
| Description | URL | 中国用户 |
|---|---|---|
| Streaming speech recognition | Address | 点此 |
| Text-to-speech | Address | 点此 |
| Voice activity detection (VAD) | Address | 点此 |
| VAD + non-streaming speech recognition | Address | 点此 |
| Two-pass speech recognition | Address | 点此 |
| Audio tagging | Address | 点此 |
| Audio tagging (WearOS) | Address | 点此 |
| Speaker identification | Address | 点此 |
| Spoken language identification | Address | 点此 |
| Keyword spotting | Address | 点此 |
Links for pre-built Flutter APPs
Real-time speech recognition
| Description | URL | 中国用户 |
|---|---|---|
| Streaming speech recognition | Address | 点此 |
Text-to-speech
| Description | URL | 中国用户 |
|---|---|---|
| Android (arm64-v8a, armeabi-v7a, x86_64) | Address | 点此 |
| Linux (x64) | Address | 点此 |
| macOS (x64) | Address | 点此 |
| macOS (arm64) | Address | 点此 |
| Windows (x64) | Address | 点此 |
Note: You need to build from source for iOS.
Links for pre-built Lazarus APPs
Generating subtitles
| Description | URL | 中国用户 |
|---|---|---|
| Generate subtitles (生成字幕) | Address | 点此 |
Links for pre-trained models
| Description | URL |
|---|---|
| Speech recognition (speech to text, ASR) | Address |
| Text-to-speech (TTS) | Address |
| VAD | Address |
| Keyword spotting | Address |
| Audio tagging | Address |
| Speaker identification (Speaker ID) | Address |
| Spoken language identification (Language ID) | See multi-lingual Whisper ASR models from Speech recognition |
| Punctuation | Address |
Useful links
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
How to reach us
Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.
Description
Languages
C++
38.3%
Python
16.3%
Shell
7.6%
Kotlin
5.1%
JavaScript
5.1%
Other
27.4%