Commit Graph

117 Commits

Author SHA1 Message Date
Fangjun Kuang
2c2926af7d Add C++ runtime for Matcha-TTS (#1627) 2024-12-31 12:44:14 +08:00
Fangjun Kuang
2101227269 Add streaming ASR support for HarmonyOS. (#1565) 2024-11-26 18:36:56 +08:00
Fangjun Kuang
669f5ef441 Add C++ runtime and Python APIs for Moonshine models (#1473) 2024-10-26 14:34:07 +08:00
Fangjun Kuang
1d061df355 WebAssembly exmaple for speaker diarization (#1411) 2024-10-10 22:14:45 +08:00
Fangjun Kuang
8535b1d3bb Python API for speaker diarization. (#1400) 2024-10-09 14:13:26 +08:00
Fangjun Kuang
b965f14cf0 Add Python API for clustering (#1385) 2024-09-30 11:33:15 +08:00
Fangjun Kuang
1423ddb1f0 Support specifying max speech duration for VAD. (#1348) 2024-09-14 10:57:46 +08:00
Fangjun Kuang
544857b097 Fix building (#1343) 2024-09-13 13:33:52 +08:00
Lim Yao Chong
3bffc24d64 Add Python binding for online punctuation models (#1312) 2024-09-09 10:26:53 +08:00
SilverSulfide
888f74bf3c Re-implement LM rescore for online transducer (#1231)
Co-authored-by: Martins Kronis <martins.kuznecovs@tilde.lv>
2024-09-06 10:01:25 +08:00
Robin Zhong
62c4d4ab62 Add emotion, event of SenseVoice. (#1257)
* Add emotion, event of SenseVoice.

* Fix tokens size check and update java api.

https://github.com/k2-fsa/sherpa-onnx/pull/1257
2024-08-14 15:50:13 +08:00
xsjk
1da75ee3c0 Fix typo in offline-lm-config.cc (#1229) 2024-08-07 15:38:34 +08:00
Fangjun Kuang
25f0a10468 Add C++ runtime for SenseVoice models (#1148) 2024-07-18 22:54:18 +08:00
Wei Kang
5b1fa8750f Fix hotwords OOV log (#1139) 2024-07-16 19:41:31 +08:00
Fangjun Kuang
08c758520f Add keyword spotting for C# (#1105) 2024-07-10 21:18:46 +08:00
Fangjun Kuang
dd0ff2ca06 Support onnxruntime 1.18.0 (#906) 2024-07-10 17:05:26 +08:00
Fangjun Kuang
c2cc9dec58 Add Flush to VAD so that the last segment can be detected. (#1099) 2024-07-09 16:15:56 +08:00
Manix
3e4307e2fb updating trt workspace int64 (#1094)
Signed-off-by: Manix <manickavela1998@gmail.com>
2024-07-08 20:38:16 +08:00
Manix
55decb7bee Add config for TensorRT and CUDA execution provider (#992)
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com>
Signed-off-by: manickavela1998@gmail.com <manickavela.arumugam@uniphore.com>
2024-07-05 15:18:37 +08:00
Fangjun Kuang
9dd0e03568 Enable to stop TTS generation (#1041) 2024-06-22 18:18:36 +08:00
SilverSulfide
656b9fa1c8 Add Python API support for Offline LM rescoring (#1033) 2024-06-19 16:29:37 +08:00
Fangjun Kuang
349d957da2 Add inverse text normalization for online ASR (#1020) 2024-06-17 18:39:23 +08:00
Fangjun Kuang
b0f7ed3ee3 Add inverse text normalization for non-streaming ASR (#1017) 2024-06-17 14:28:53 +08:00
Fangjun Kuang
1a43d1e37f Support getting word IDs for CTC HLG decoding. (#978) 2024-06-06 14:22:39 +08:00
Fangjun Kuang
fd5a0d1e00 Add C++ runtime for Tele-AI/TeleSpeech-ASR (#970) 2024-06-05 00:26:40 +08:00
Fangjun Kuang
f1cff83ef9 Add address sanitizer and undefined behavior sanitizer (#951) 2024-05-31 13:17:01 +08:00
Fangjun Kuang
4f21aabd3c Fix CI for JavaScript and Python APIs. (#901) 2024-05-22 13:57:00 +08:00
Wei Kang
b012b78ceb Encode hotwords in C++ side (#828)
* Encode hotwords in C++ side
2024-05-20 19:41:36 +08:00
Fangjun Kuang
46e4e5b7ac Add C++ support for streaming NeMo CTC models. (#857) 2024-05-10 16:26:43 +08:00
Fangjun Kuang
17cd3a5f01 Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854) 2024-05-10 12:15:39 +08:00
Karel Vesely
2e45d327a5 Adding temperature scaling on Joiner logits: (#789)
* Adding temperature scaling on Joiner logits:

- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
    - the BPE scores were rescaled with 0.2 (but then also incorrect words
      get high confidence, visually reasonable histograms are for 0.5 scale)
    - BPE->WORD score merging done by min(.) function
      (tried also prob-product, and also arithmetic, geometric, harmonic mean)

- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)

Results seem consistent with: https://arxiv.org/abs/2110.15222

Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.

I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.

Temperature scling added also to the Greedy search confidences.

* making `temperature_scale` configurable from outside
2024-04-26 09:44:26 +08:00
Fangjun Kuang
6b353bfb42 Add jieba for Chinese TTS models (#797) 2024-04-21 14:47:13 +08:00
Fangjun Kuang
54bc504065 Add Python API example for CED audio tagging. (#793) 2024-04-19 18:33:18 +08:00
Fangjun Kuang
c1608b3524 Support CED models (#792) 2024-04-19 15:20:37 +08:00
chiiyeh
aa2d695fd2 Add score function to speaker identification (#775) 2024-04-16 17:29:46 +08:00
Fangjun Kuang
6bf2099781 Fix code style issues (#774) 2024-04-16 09:46:15 +08:00
Manix
fb4aee83ac Adding warm up for Zipformer2 (#766)
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com>
2024-04-16 09:16:55 +08:00
Fangjun Kuang
13730ecbd8 Add C API for punctuation (#768) 2024-04-14 19:02:34 +08:00
Fangjun Kuang
68b8b88b5a Add Python API for punctuation models. (#762) 2024-04-13 13:28:17 +08:00
Fangjun Kuang
34d70a259f Add Python API and Python examples for audio tagging (#753) 2024-04-11 11:12:48 +08:00
Fangjun Kuang
0d90b34e4a Support Chinese heteronyms on Android for TTS. (#742) 2024-04-08 21:36:47 +08:00
Fangjun Kuang
6b3d2b87f9 Fix releasing GIL (#741) 2024-04-08 17:22:48 +08:00
Fangjun Kuang
6fb8ceda57 Add VAD examples using ALSA for recording (#739) 2024-04-08 16:41:01 +08:00
Fangjun Kuang
a5f8fbc83f Support heteronyms in Chinese TTS (#738) 2024-04-08 11:01:30 +08:00
Fangjun Kuang
db67e00c77 Add HLG decoding for streaming CTC models (#731) 2024-04-03 21:31:42 +08:00
Fangjun Kuang
2e0bccad36 Add C API for speaker embedding extractor. (#711) 2024-03-28 18:05:40 +08:00
Leo Huang
638f48f47a Added progress for callback of tts generator (#712)
Co-authored-by: leohwang <leohwang@360converter.com>
2024-03-28 17:12:20 +08:00
Fangjun Kuang
4e040c596e Support including TTS conditionally. (#699) 2024-03-26 17:21:35 +08:00
Fangjun Kuang
0d258dd150 Support spoken language identification with whisper (#694) 2024-03-24 22:57:00 +08:00
Karel Vesely
eaec4c83c2 Configurable low_freq high_freq, dithering (#664) 2024-03-22 21:41:44 +08:00