Commit Graph

79 Commits

Author SHA1 Message Date
Fangjun Kuang
349d957da2 Add inverse text normalization for online ASR (#1020) 2024-06-17 18:39:23 +08:00
Fangjun Kuang
b0f7ed3ee3 Add inverse text normalization for non-streaming ASR (#1017) 2024-06-17 14:28:53 +08:00
Fangjun Kuang
1a43d1e37f Support getting word IDs for CTC HLG decoding. (#978) 2024-06-06 14:22:39 +08:00
Fangjun Kuang
fd5a0d1e00 Add C++ runtime for Tele-AI/TeleSpeech-ASR (#970) 2024-06-05 00:26:40 +08:00
Fangjun Kuang
f1cff83ef9 Add address sanitizer and undefined behavior sanitizer (#951) 2024-05-31 13:17:01 +08:00
Wei Kang
b012b78ceb Encode hotwords in C++ side (#828)
* Encode hotwords in C++ side
2024-05-20 19:41:36 +08:00
Fangjun Kuang
46e4e5b7ac Add C++ support for streaming NeMo CTC models. (#857) 2024-05-10 16:26:43 +08:00
Fangjun Kuang
17cd3a5f01 Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854) 2024-05-10 12:15:39 +08:00
Karel Vesely
2e45d327a5 Adding temperature scaling on Joiner logits: (#789)
* Adding temperature scaling on Joiner logits:

- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
    - the BPE scores were rescaled with 0.2 (but then also incorrect words
      get high confidence, visually reasonable histograms are for 0.5 scale)
    - BPE->WORD score merging done by min(.) function
      (tried also prob-product, and also arithmetic, geometric, harmonic mean)

- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)

Results seem consistent with: https://arxiv.org/abs/2110.15222

Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.

I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.

Temperature scling added also to the Greedy search confidences.

* making `temperature_scale` configurable from outside
2024-04-26 09:44:26 +08:00
Fangjun Kuang
6b353bfb42 Add jieba for Chinese TTS models (#797) 2024-04-21 14:47:13 +08:00
Fangjun Kuang
54bc504065 Add Python API example for CED audio tagging. (#793) 2024-04-19 18:33:18 +08:00
Fangjun Kuang
c1608b3524 Support CED models (#792) 2024-04-19 15:20:37 +08:00
chiiyeh
aa2d695fd2 Add score function to speaker identification (#775) 2024-04-16 17:29:46 +08:00
Fangjun Kuang
6bf2099781 Fix code style issues (#774) 2024-04-16 09:46:15 +08:00
Manix
fb4aee83ac Adding warm up for Zipformer2 (#766)
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com>
2024-04-16 09:16:55 +08:00
Fangjun Kuang
13730ecbd8 Add C API for punctuation (#768) 2024-04-14 19:02:34 +08:00
Fangjun Kuang
68b8b88b5a Add Python API for punctuation models. (#762) 2024-04-13 13:28:17 +08:00
Fangjun Kuang
34d70a259f Add Python API and Python examples for audio tagging (#753) 2024-04-11 11:12:48 +08:00
Fangjun Kuang
0d90b34e4a Support Chinese heteronyms on Android for TTS. (#742) 2024-04-08 21:36:47 +08:00
Fangjun Kuang
6b3d2b87f9 Fix releasing GIL (#741) 2024-04-08 17:22:48 +08:00
Fangjun Kuang
6fb8ceda57 Add VAD examples using ALSA for recording (#739) 2024-04-08 16:41:01 +08:00
Fangjun Kuang
a5f8fbc83f Support heteronyms in Chinese TTS (#738) 2024-04-08 11:01:30 +08:00
Fangjun Kuang
db67e00c77 Add HLG decoding for streaming CTC models (#731) 2024-04-03 21:31:42 +08:00
Fangjun Kuang
2e0bccad36 Add C API for speaker embedding extractor. (#711) 2024-03-28 18:05:40 +08:00
Leo Huang
638f48f47a Added progress for callback of tts generator (#712)
Co-authored-by: leohwang <leohwang@360converter.com>
2024-03-28 17:12:20 +08:00
Fangjun Kuang
4e040c596e Support including TTS conditionally. (#699) 2024-03-26 17:21:35 +08:00
Fangjun Kuang
0d258dd150 Support spoken language identification with whisper (#694) 2024-03-24 22:57:00 +08:00
Karel Vesely
eaec4c83c2 Configurable low_freq high_freq, dithering (#664) 2024-03-22 21:41:44 +08:00
Fangjun Kuang
3232dff2cf Support user provided data in tts callback. (#653) 2024-03-09 18:15:03 +08:00
Fangjun Kuang
d3287f9494 Add Python ASR examples with alsa (#646) 2024-03-08 11:34:48 +08:00
Wei Kang
734bbd91dc Add Python API for keyword spotting (#576)
* Add alsa & microphone support for keyword spotting

* Add python wrapper
2024-03-01 09:31:11 +08:00
Karel Vesely
38c072dcb2 Track token scores (#571)
* add export of per-token scores (ys, lm, context)

- for best path of the modified-beam-search decoding of transducer

* refactoring JSON export of OnlineRecognitionResult, extending pybind11 API of OnlineRecognitionResult

* export per-token scores also for greedy-search (online-transducer)

- export un-scaled lm_probs (modified-beam search, online-transducer)
- polishing

* fill lm_probs/context_scores only if LM/ContextGraph is present (make Result smaller)
2024-02-29 06:28:45 +08:00
Askars
763a51486e Add missing start_time to python API (#591)
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv>
2024-02-20 20:47:53 +08:00
chiiyeh
e7b18a2139 add blank_penalty for online transducer (#548) 2024-01-26 12:12:13 +08:00
chiiyeh
3bb3849ec5 add blank_penalty for offline transducer (#542) 2024-01-25 15:00:09 +08:00
Fangjun Kuang
bbd7c7fc18 Add Android demo for speaker recognition (#536)
See pre-built Android APKs at 
https://k2-fsa.github.io/sherpa/onnx/speaker-identification/apk.html
2024-01-23 16:50:52 +08:00
Fangjun Kuang
55266918c8 Add runtime support for wespeaker models (#516) 2024-01-09 22:06:08 +08:00
Fangjun Kuang
e215d0c39a Fix Byte BPE string results for Python. (#512)
It ignores invalid UTF8 strings.
2024-01-03 16:03:24 +08:00
Fangjun Kuang
d7e10bb3f8 Replace Android system TTS engine (#508) 2023-12-31 23:02:35 +08:00
Fangjun Kuang
e475e750ac Support streaming zipformer CTC (#496)
* Support streaming zipformer CTC

* test online zipformer2 CTC

* Update doc of sherpa-onnx.cc

* Add Python APIs for streaming zipformer2 ctc

* Add Python API examples for streaming zipformer2 ctc

* Swift API for streaming zipformer2 CTC

* NodeJS API for streaming zipformer2 CTC

* Kotlin API for streaming zipformer2 CTC

* Golang API for streaming zipformer2 CTC

* C# API for streaming zipformer2 CTC

* Release v1.9.6
2023-12-22 13:46:33 +08:00
Fangjun Kuang
7634f5f034 Release Python GIL in C++ class constructor (#493) 2023-12-20 15:54:32 +08:00
Fangjun Kuang
3ae984f148 Remove the 30-second constraint from whisper. (#471) 2023-12-07 17:47:08 +08:00
Fangjun Kuang
99ff6a834c Play generated audio as it is generating. (#457) 2023-12-02 15:35:11 +08:00
Fangjun Kuang
62dc3c3e46 Use piper-phonemize to convert text to token IDs (#453) 2023-11-30 23:57:43 +08:00
Fangjun Kuang
87a47d7db4 Release GIL to support multithreading in websocket servers. (#451) 2023-11-27 13:44:03 +08:00
Fangjun Kuang
fac4f6bc7c Support streaming conformer CTC models from wenet (#427) 2023-11-16 10:35:23 +08:00
Fangjun Kuang
b83b3e3cd1 Support non-streaming WeNet CTC models. (#426) 2023-11-15 14:23:20 +08:00
Fangjun Kuang
d1a450bf82 Support text normalization via rule FST (#407) 2023-11-05 08:59:03 +08:00
Fangjun Kuang
1937717705 Add MFC TTS example on Windows (#378) 2023-10-21 00:13:07 +08:00
Fangjun Kuang
9efe69720d Support VITS VCTK models (#367)
* Support VITS VCTK models

* Release v1.8.1
2023-10-16 17:22:30 +08:00