Commit Graph

69 Commits

Author SHA1 Message Date
Fangjun Kuang
2c2926af7d Add C++ runtime for Matcha-TTS (#1627) 2024-12-31 12:44:14 +08:00
Fangjun Kuang
669f5ef441 Add C++ runtime and Python APIs for Moonshine models (#1473) 2024-10-26 14:34:07 +08:00
Fangjun Kuang
8535b1d3bb Python API for speaker diarization. (#1400) 2024-10-09 14:13:26 +08:00
Fangjun Kuang
b965f14cf0 Add Python API for clustering (#1385) 2024-09-30 11:33:15 +08:00
Lim Yao Chong
3bffc24d64 Add Python binding for online punctuation models (#1312) 2024-09-09 10:26:53 +08:00
SilverSulfide
888f74bf3c Re-implement LM rescore for online transducer (#1231)
Co-authored-by: Martins Kronis <martins.kuznecovs@tilde.lv>
2024-09-06 10:01:25 +08:00
xsjk
1da75ee3c0 Fix typo in offline-lm-config.cc (#1229) 2024-08-07 15:38:34 +08:00
Fangjun Kuang
25f0a10468 Add C++ runtime for SenseVoice models (#1148) 2024-07-18 22:54:18 +08:00
Wei Kang
5b1fa8750f Fix hotwords OOV log (#1139) 2024-07-16 19:41:31 +08:00
Manix
55decb7bee Add config for TensorRT and CUDA execution provider (#992)
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com>
Signed-off-by: manickavela1998@gmail.com <manickavela.arumugam@uniphore.com>
2024-07-05 15:18:37 +08:00
SilverSulfide
656b9fa1c8 Add Python API support for Offline LM rescoring (#1033) 2024-06-19 16:29:37 +08:00
Fangjun Kuang
349d957da2 Add inverse text normalization for online ASR (#1020) 2024-06-17 18:39:23 +08:00
Fangjun Kuang
b0f7ed3ee3 Add inverse text normalization for non-streaming ASR (#1017) 2024-06-17 14:28:53 +08:00
Fangjun Kuang
fd5a0d1e00 Add C++ runtime for Tele-AI/TeleSpeech-ASR (#970) 2024-06-05 00:26:40 +08:00
Wei Kang
b012b78ceb Encode hotwords in C++ side (#828)
* Encode hotwords in C++ side
2024-05-20 19:41:36 +08:00
Fangjun Kuang
46e4e5b7ac Add C++ support for streaming NeMo CTC models. (#857) 2024-05-10 16:26:43 +08:00
Fangjun Kuang
17cd3a5f01 Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854) 2024-05-10 12:15:39 +08:00
Karel Vesely
2e45d327a5 Adding temperature scaling on Joiner logits: (#789)
* Adding temperature scaling on Joiner logits:

- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
    - the BPE scores were rescaled with 0.2 (but then also incorrect words
      get high confidence, visually reasonable histograms are for 0.5 scale)
    - BPE->WORD score merging done by min(.) function
      (tried also prob-product, and also arithmetic, geometric, harmonic mean)

- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)

Results seem consistent with: https://arxiv.org/abs/2110.15222

Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.

I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.

Temperature scling added also to the Greedy search confidences.

* making `temperature_scale` configurable from outside
2024-04-26 09:44:26 +08:00
Fangjun Kuang
68b8b88b5a Add Python API for punctuation models. (#762) 2024-04-13 13:28:17 +08:00
Fangjun Kuang
34d70a259f Add Python API and Python examples for audio tagging (#753) 2024-04-11 11:12:48 +08:00
Fangjun Kuang
6fb8ceda57 Add VAD examples using ALSA for recording (#739) 2024-04-08 16:41:01 +08:00
Fangjun Kuang
db67e00c77 Add HLG decoding for streaming CTC models (#731) 2024-04-03 21:31:42 +08:00
Fangjun Kuang
0d258dd150 Support spoken language identification with whisper (#694) 2024-03-24 22:57:00 +08:00
Karel Vesely
eaec4c83c2 Configurable low_freq high_freq, dithering (#664) 2024-03-22 21:41:44 +08:00
Bhaswati Saha
fda614d0d1 beam search value as parameter in offline_recognizer.py (#673)
Co-authored-by: bhascns <bhaswati@mihup.com>
2024-03-18 18:43:05 +08:00
Fangjun Kuang
d3287f9494 Add Python ASR examples with alsa (#646) 2024-03-08 11:34:48 +08:00
Wei Kang
734bbd91dc Add Python API for keyword spotting (#576)
* Add alsa & microphone support for keyword spotting

* Add python wrapper
2024-03-01 09:31:11 +08:00
Karel Vesely
38c072dcb2 Track token scores (#571)
* add export of per-token scores (ys, lm, context)

- for best path of the modified-beam-search decoding of transducer

* refactoring JSON export of OnlineRecognitionResult, extending pybind11 API of OnlineRecognitionResult

* export per-token scores also for greedy-search (online-transducer)

- export un-scaled lm_probs (modified-beam search, online-transducer)
- polishing

* fill lm_probs/context_scores only if LM/ContextGraph is present (make Result smaller)
2024-02-29 06:28:45 +08:00
Askars
763a51486e Add missing start_time to python API (#591)
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv>
2024-02-20 20:47:53 +08:00
Fangjun Kuang
44efff4e47 Fix CI tests for Python and JNI. (#554) 2024-01-27 13:01:54 +08:00
chiiyeh
e7b18a2139 add blank_penalty for online transducer (#548) 2024-01-26 12:12:13 +08:00
chiiyeh
466a6855c8 add hotwords docstring to offline_recognizer and online_recognizer (#546) 2024-01-25 16:54:20 +08:00
chiiyeh
3bb3849ec5 add blank_penalty for offline transducer (#542) 2024-01-25 15:00:09 +08:00
Wei Kang
b6c020901a decoder for open vocabulary keyword spotting (#505)
* various fixes to ContextGraph to support open vocabulary keywords decoder

* Add keyword spotter runtime

* Add binary

* First version works

* Minor fixes

* update text2token

* default values

* Add jni for kws

* add kws android project

* Minor fixes

* Remove unused interface

* Minor fixes

* Add workflow

* handle extra info in texts

* Minor fixes

* Add more comments

* Fix ci

* fix cpp style

* Add input box in android demo so that users can specify their keywords

* Fix cpp style

* Fix comments

* Minor fixes

* Minor fixes

* minor fixes

* Minor fixes

* Minor fixes

* Add CI

* Fix code style

* cpplint

* Fix comments

* Fix error
2024-01-20 22:52:41 +08:00
Fangjun Kuang
55266918c8 Add runtime support for wespeaker models (#516) 2024-01-09 22:06:08 +08:00
Fangjun Kuang
e475e750ac Support streaming zipformer CTC (#496)
* Support streaming zipformer CTC

* test online zipformer2 CTC

* Update doc of sherpa-onnx.cc

* Add Python APIs for streaming zipformer2 ctc

* Add Python API examples for streaming zipformer2 ctc

* Swift API for streaming zipformer2 CTC

* NodeJS API for streaming zipformer2 CTC

* Kotlin API for streaming zipformer2 CTC

* Golang API for streaming zipformer2 CTC

* C# API for streaming zipformer2 CTC

* Release v1.9.6
2023-12-22 13:46:33 +08:00
Fangjun Kuang
0e23f82691 Give an informative log for whisper on exceptions. (#473) 2023-12-08 14:33:59 +08:00
Fangjun Kuang
049fb9f451 Add Python APIs for WeNet CTC models (#428) 2023-11-16 14:20:41 +08:00
Fangjun Kuang
655e0fa836 add python API and examples for TTS (#364) 2023-10-14 14:21:53 +08:00
Peng He
4771c9275c Add lm decode for the Python API. (#353)
* Add lm decode for the Python API.

* fix style.

* Fix LogAdd,

	Shouldn't double lm_log_prob when merge same prefix path

* sort the import alphabetically
2023-10-13 11:15:16 +08:00
Fangjun Kuang
407602445d Add CTC HLG decoding using OpenFst (#349) 2023-10-08 11:32:39 +08:00
Fangjun Kuang
33a5765169 Print a more user-friendly error message when using --hotwords-file. (#344) 2023-09-26 11:04:20 +08:00
Fangjun Kuang
c471423125 Add Silero VAD (#313) 2023-09-17 14:54:38 +08:00
Wei Kang
47184f9db7 Refactor hotwords,support loading hotwords from file (#296) 2023-09-14 19:33:17 +08:00
Fangjun Kuang
f709c95c5f Support multilingual whisper models (#274) 2023-08-16 00:28:52 +08:00
Fangjun Kuang
6038e2aa62 Support streaming paraformer (#263) 2023-08-14 10:32:14 +08:00
Fangjun Kuang
a4bff28e21 Support TDNN models from the yesno recipe from icefall (#262) 2023-08-12 19:50:22 +08:00
Fangjun Kuang
b094868fb8 Add non-streaming websocket server for python (#259) 2023-08-11 15:56:24 +08:00
Fangjun Kuang
79c2ce5dd4 Refactor online recognizer (#250)
* Refactor online recognizer.

Make it easier to support other streaming models.

Note that it is a breaking change for the Python API.
`sherpa_onnx.OnlineRecognizer()` used before should be
replaced by `sherpa_onnx.OnlineRecognizer.from_transducer()`.
2023-08-09 20:27:31 +08:00
Fangjun Kuang
45b9d4ab37 Support whisper models (#238) 2023-08-07 12:34:18 +08:00