Commit Graph

263 Commits

Author SHA1 Message Date
Ikko Eltociear Ashimine
155f22d511 Update features.h (#994) 2024-06-12 15:47:44 +08:00
Fangjun Kuang
208da78343 Limit the maximum segment length for VAD. (#990) 2024-06-12 10:49:37 +08:00
Fangjun Kuang
1a43d1e37f Support getting word IDs for CTC HLG decoding. (#978) 2024-06-06 14:22:39 +08:00
Manix
69347ffc8f Support TensorRT provider (#921)
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com>
Signed-off-by: manickavela1998@gmail.com <manickavela.arumugam@uniphore.com>
2024-06-06 10:45:28 +08:00
Fangjun Kuang
7e0931c762 Fix punctuation (#976) 2024-06-05 11:23:19 +08:00
Fangjun Kuang
fd5a0d1e00 Add C++ runtime for Tele-AI/TeleSpeech-ASR (#970) 2024-06-05 00:26:40 +08:00
Fangjun Kuang
f1cff83ef9 Add address sanitizer and undefined behavior sanitizer (#951) 2024-05-31 13:17:01 +08:00
Wei Kang
a38881817c Support customize scores for hotwords (#926)
* Support customize scores for hotwords

* Skip blank lines
2024-05-31 12:34:30 +08:00
Fangjun Kuang
a689249f88 Fix building for Android (#949) 2024-05-31 10:27:29 +08:00
Fangjun Kuang
082f230dfb Fix nemo streaming transducer greedy search (#944) 2024-05-30 15:31:10 +08:00
Sangeet Sagar
3f472a9993 Add C++ runtime for *streaming* faster conformer transducer from NeMo. (#889)
Co-authored-by: sangeet2020 <15uec053@gmail.com>
2024-05-30 13:55:03 +08:00
Fangjun Kuang
50a2eaa41f Reset encoder states on endpointing for streaming transducer. (#924) 2024-05-28 17:06:17 +08:00
Fangjun Kuang
49ea59d4ff Add Flutter GUI example for VAD with a microphone. (#905) 2024-05-24 23:48:12 +08:00
Dadoou
4fc0a1dc64 Update offline-ctc-greedy-search-decoder.cc (#917)
Bug fixes.
Z_O_O will be decoded as ZO instead of ZOO.
To fix this, prev_id should update every time.
2024-05-24 22:31:56 +08:00
Fangjun Kuang
cf83412d0a Support reading waves from NAudio. (#914) 2024-05-24 11:07:44 +08:00
Fangjun Kuang
2db777587e Fix CI tests. (#907) 2024-05-23 14:49:37 +08:00
Fangjun Kuang
81346d1172 Fix reading wave files generated by NAudio. (#903) 2024-05-22 19:56:06 +08:00
Wei Kang
b012b78ceb Encode hotwords in C++ side (#828)
* Encode hotwords in C++ side
2024-05-20 19:41:36 +08:00
Manix
740d7ae9d6 fixing bug and compiler error (#870)
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com>
2024-05-13 17:44:03 +08:00
Fangjun Kuang
384f96c40f Add streaming CTC ASR APIs for node-addon-api (#867) 2024-05-13 11:58:25 +08:00
Fangjun Kuang
db85b2c1d8 Add Android APKs for NeMo CTC models. (#866) 2024-05-12 14:58:36 +08:00
Fangjun Kuang
7322f4e0a3 Fix node addon tests (#865)
* Install naudiodon2 manually.

It is needed only when using a microphone. The CI tests don't need it.
2024-05-12 12:03:43 +08:00
Fangjun Kuang
46e4e5b7ac Add C++ support for streaming NeMo CTC models. (#857) 2024-05-10 16:26:43 +08:00
yh646492956
1eb60e8711 Solve the issue of missing the last sentence with punctuation (#856)
Co-authored-by: Hao You <13182720519@sina.cn>
2024-05-10 15:41:42 +08:00
Fangjun Kuang
17cd3a5f01 Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854) 2024-05-10 12:15:39 +08:00
Fangjun Kuang
5d8c35e44e Add C++ support for non-streaming NeMo fast conformer hybrid transducer ctc (the ctc branch) (#848) 2024-05-09 15:32:22 +08:00
Fangjun Kuang
5407f880c0 Add Java and Kotlin API for punctuation models (#818) 2024-04-26 22:06:48 +08:00
Fangjun Kuang
f2d074aea9 Fix a bug for offline paraformer (#816) 2024-04-26 16:40:42 +08:00
Fangjun Kuang
612002da57 Fix C# to support Chinese tts models using jieba (#815) 2024-04-26 11:50:07 +08:00
Karel Vesely
2e45d327a5 Adding temperature scaling on Joiner logits: (#789)
* Adding temperature scaling on Joiner logits:

- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
    - the BPE scores were rescaled with 0.2 (but then also incorrect words
      get high confidence, visually reasonable histograms are for 0.5 scale)
    - BPE->WORD score merging done by min(.) function
      (tried also prob-product, and also arithmetic, geometric, harmonic mean)

- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)

Results seem consistent with: https://arxiv.org/abs/2110.15222

Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.

I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.

Temperature scling added also to the Greedy search confidences.

* making `temperature_scale` configurable from outside
2024-04-26 09:44:26 +08:00
Daniel Doña
fa2429920f Add function 'tolowerUnicode' in sherpa-onnx-microphone (fix #791) (#812) 2024-04-26 09:19:32 +08:00
Fangjun Kuang
c3a2e8a67c Refactor Java API (#806) 2024-04-24 18:41:48 +08:00
Fangjun Kuang
9b67a476e6 Refactor the JNI interface to make it more modular and maintainable (#802) 2024-04-24 09:48:42 +08:00
Fangjun Kuang
7f3b9ffe5d Refactor TTS Android code to support jieba for Chinese TTS models (#800) 2024-04-22 17:21:05 +08:00
Fangjun Kuang
494cb5c733 Fix the last character not being recognized for streaming paraformer models. (#799) 2024-04-22 15:10:39 +08:00
Fangjun Kuang
6b353bfb42 Add jieba for Chinese TTS models (#797) 2024-04-21 14:47:13 +08:00
Fangjun Kuang
c1608b3524 Support CED models (#792) 2024-04-19 15:20:37 +08:00
Fangjun Kuang
d97a283dbb Add Android demo for spoken language identification using Whisper multilingual models (#783) 2024-04-18 14:33:59 +08:00
chiiyeh
aa2d695fd2 Add score function to speaker identification (#775) 2024-04-16 17:29:46 +08:00
Fangjun Kuang
6bf2099781 Fix code style issues (#774) 2024-04-16 09:46:15 +08:00
Fangjun Kuang
81b7f1d529 Fix display for sherpa-onnx-microphone (#773) 2024-04-16 09:17:23 +08:00
Manix
fb4aee83ac Adding warm up for Zipformer2 (#766)
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com>
2024-04-16 09:16:55 +08:00
Fangjun Kuang
5981adf454 Add Kotlin API for audio tagging (#770) 2024-04-15 13:49:35 +08:00
Fangjun Kuang
13730ecbd8 Add C API for punctuation (#768) 2024-04-14 19:02:34 +08:00
Fangjun Kuang
983df28a83 Fix a punctuation bug (#764) 2024-04-13 19:08:46 +08:00
Fangjun Kuang
329fe1aa8b Support adding punctuations to the speech recogntion result (#761) 2024-04-13 12:15:57 +08:00
Manix
399d920b47 [feature] Configurable padding length in online websocket server (#755)
Signed-off-by: manickavela29 <manickavela1998@gmail.com>
2024-04-11 14:57:11 +08:00
AHN Sung Hwan
904a3cc8a9 Fix a bug in mean calculation of 'ys_probs' (#748) 2024-04-11 10:34:44 +08:00
Fangjun Kuang
042976ea6e Add C++ microphone examples for audio tagging (#749) 2024-04-10 21:00:35 +08:00
Fangjun Kuang
f20291cadc Support audio tagging using zipformer (#747) 2024-04-10 14:47:06 +08:00