This repository has been archived on 2025-08-26 . You can view files and clone it, but cannot push or open issues or pull requests.
2e45d327a50ef0ae16c2619a028ced0e872c5361
* Adding temperature scaling on Joiner logits:
- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
- the BPE scores were rescaled with 0.2 (but then also incorrect words
get high confidence, visually reasonable histograms are for 0.5 scale)
- BPE->WORD score merging done by min(.) function
(tried also prob-product, and also arithmetic, geometric, harmonic mean)
- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)
Results seem consistent with: https://arxiv.org/abs/2110.15222
Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.
I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.
Temperature scling added also to the Greedy search confidences.
* making `temperature_scale` configurable from outside
Introduction
This repository supports running the following functions locally
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., silero-vad)
on the following platforms and operating systems:
- x86,
x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64) - Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- NodeJS
- WebAssembly
- Raspberry Pi
- RV1126
- LicheePi4A
- VisionFive 2
- 旭日X3派
- etc
with the following APIs
- C++
- C
- Python
- Go
C#- Javascript
- Java
- Kotlin
- Swift
Useful links
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- APK for the text-to-speech engine: https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html
- APK for speaker identification: https://k2-fsa.github.io/sherpa/onnx/speaker-identification/apk.html
- APK for speech recognition: https://github.com/k2-fsa/sherpa-onnx/releases/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
How to reach us
Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.
Description
Languages
C++
38.3%
Python
16.3%
Shell
7.6%
Kotlin
5.1%
JavaScript
5.1%
Other
27.4%