Commit Graph

432 Commits

Author SHA1 Message Date
Fangjun Kuang
363b8e4c1e Fix vad.Flush(). (#1329)
Fixes #1314
2024-09-08 17:52:53 +08:00
SilverSulfide
888f74bf3c Re-implement LM rescore for online transducer (#1231)
Co-authored-by: Martins Kronis <martins.kuznecovs@tilde.lv>
2024-09-06 10:01:25 +08:00
RGdevz
1f29e4a1a9 throw error instead exit (#1323) 2024-09-06 09:59:21 +08:00
Fangjun Kuang
3687c9f60a Reduce onnxruntime log output. (#1306)
Change the logging level from WARNING to ERROR.
2024-08-30 12:50:34 +08:00
Fangjun Kuang
ca30d83915 Avoid SherpaOnnxSpeakerEmbeddingManagerFreeBestMatches freeing null. (#1296)
Fixes #1295
2024-08-28 10:42:36 +08:00
Fangjun Kuang
537e163dd0 WebAssembly example for VAD + Non-streaming ASR (#1284) 2024-08-24 13:24:52 +08:00
Malcolm Ke Win
c61423ec5a Update wave-reader.cc (#1278)
* Update wave-reader.cc

missing "#include <cstdint>"
2024-08-22 23:22:45 +08:00
Robin Zhong
d8001d6edc update kotlin api for better release native object and add user-friendly apis. (#1275) 2024-08-22 19:18:11 +08:00
Fangjun Kuang
5a2aa110b8 Text to speech API for Object Pascal. (#1273) 2024-08-20 20:52:16 +08:00
Fangjun Kuang
e34a1a2aa3 Object pascal examples for recording and playing audio with portaudio. (#1271)
The recording example can be used for speech recognition while the playing example can be used for text to speech.

The portaudio wrapper for object pascal is copied from
https://github.com/UltraStar-Deluxe/USDX/blob/master/src/lib/portaudio/portaudio.pas
2024-08-18 19:51:08 +08:00
Fangjun Kuang
f93f0ca94d Use a separate thread to initialize models for lazarus examples. (#1270)
So that the main thread is not blocked and the user interface is responsive.
2024-08-18 14:59:48 +08:00
Fangjun Kuang
88809753ab Release v1.10.22 (#1267) 2024-08-16 22:40:49 +08:00
Fangjun Kuang
9dcea49dba Fix looking up OOVs in lexicon.txt for MeloTTS models. (#1266)
If an English word does not exist in the lexicon, we split
it into characters. For instance, if the word TTS does not
exist in lexicon.txt, we split it into 3 characters T, T, and S.
2024-08-16 22:10:03 +08:00
Ikko Eltociear Ashimine
a3e98750e9 chore: update online-stream.h (#1264)
Fix typos.
2024-08-16 15:17:15 +08:00
Fangjun Kuang
fbe35ba736 Add Lazarus example for generating subtitles using Silero VAD with non-streaming ASR (#1251) 2024-08-15 22:19:45 +08:00
Fangjun Kuang
ca729faebf Support reading multi-channel wave files with 8/16/32-bit encoded samples (#1258) 2024-08-15 14:54:43 +08:00
Robin Zhong
62c4d4ab62 Add emotion, event of SenseVoice. (#1257)
* Add emotion, event of SenseVoice.

* Fix tokens size check and update java api.

https://github.com/k2-fsa/sherpa-onnx/pull/1257
2024-08-14 15:50:13 +08:00
ivan provalov
9f06b059d7 Update offline-recognizer.cc (#1253)
Adding setConfig method to JNI to support setting a config on the previously initialized offline-recognizer.
2024-08-13 23:04:51 +08:00
Fangjun Kuang
619279b162 Pascal API for VAD (#1249) 2024-08-13 16:16:51 +08:00
Fangjun Kuang
a7dc6c2c16 Pascal API for non-streaming ASR (#1247) 2024-08-12 23:33:35 +08:00
Fangjun Kuang
5791b695ea Pascal API for streaming ASR (#1246) 2024-08-12 19:55:51 +08:00
Fangjun Kuang
65f1c0fab2 Add Pascal API for reading wave files (#1243) 2024-08-11 22:43:42 +08:00
Fangjun Kuang
94e256244d Add blank penalty for various language bindings. (#1234) 2024-08-08 10:43:31 +08:00
Parth Khiera
ba4cb6169f feat: addition of blank_penalty config in online_recognizer (#1232) 2024-08-08 09:10:17 +08:00
Fangjun Kuang
8a5f5c1999 Fix python two pass ASR examples (#1230) 2024-08-07 18:35:38 +08:00
xsjk
1da75ee3c0 Fix typo in offline-lm-config.cc (#1229) 2024-08-07 15:38:34 +08:00
Fangjun Kuang
375c055ff8 Fix style issues for online punctuation source files (#1225) 2024-08-06 17:43:24 +08:00
jianyou
1414e4dc61 Add online punctuation and casing prediction model for English language (#1224) 2024-08-06 17:33:38 +08:00
Fangjun Kuang
9caa488019 Fix setting SenseVoice language. (#1214) 2024-08-04 19:02:23 +08:00
Fangjun Kuang
d5f486878d Remove libonnxruntime_providers_cuda.so as a dependency. (#1210) 2024-08-03 16:25:23 +08:00
Fangjun Kuang
35c1b4a7a9 Add ReazonSpeech Japanese pre-trained model (#1203) 2024-08-02 10:21:24 +08:00
Fangjun Kuang
53484fcd9b Fix reading non-standard wav files. (#1199) 2024-08-01 17:48:04 +08:00
Fangjun Kuang
86b4c9f535 Fix splitting sentences for MeloTTS (#1186) 2024-07-29 17:04:45 +08:00
Fangjun Kuang
1f8e575133 Add TTS example for Java API. (#1176)
It plays the generated audio as it is still generating.
2024-07-28 12:07:19 +08:00
Fangjun Kuang
4e6aeff07e Refactor C API to prefix each API with SherpaOnnx. (#1171) 2024-07-26 18:47:02 +08:00
Fangjun Kuang
994c3e7c96 Add VAD + Non-streaming ASR example for JavaScript API. (#1170) 2024-07-26 12:42:08 +08:00
Fangjun Kuang
299f1a852b Fix style issues reported by clang-tidy (#1167) 2024-07-23 09:26:36 +08:00
thewh1teagle
d32a46169f feat: add directml support (#1153) 2024-07-22 23:50:48 +08:00
Fangjun Kuang
dd300b1de5 Add Java and Kotlin API for sense voice (#1164) 2024-07-22 14:08:40 +08:00
Fangjun Kuang
1a471595a5 Fix Android build (#1161) 2024-07-22 09:27:30 +08:00
Fangjun Kuang
ffdb23a8ec Add dart API for SenseVoice (#1159) 2024-07-21 21:48:12 +08:00
Fangjun Kuang
25f0a10468 Add C++ runtime for SenseVoice models (#1148) 2024-07-18 22:54:18 +08:00
Wei Kang
5b1fa8750f Fix hotwords OOV log (#1139) 2024-07-16 19:41:31 +08:00
Fangjun Kuang
960eb7529e Add C++ runtime for MeloTTS (#1138) 2024-07-16 15:55:02 +08:00
Fangjun Kuang
fa07bbc176 Add APK for small paraformer (#1133) 2024-07-15 19:44:36 +08:00
Manickavela
11cfd33b10 encoder only trt ep for transducer (#1130) 2024-07-15 14:52:33 +08:00
ivan provalov
de04b3b9bf Allow modify model config at decode time for ASR (#1124) 2024-07-13 22:30:47 +08:00
Fangjun Kuang
b5093e27f9 Fix publishing apks to huggingface (#1121)
Save APKs for each release in a separate directory.

Huggingface requires that each directory cannot contain more than 1000 files.

Since we have so many tts models and for each model we need to build APKs of 4 different ABIs,
it is a workaround for the huggingface's constraint by placing them into separate directories for different releases.
2024-07-13 16:14:00 +08:00
Fangjun Kuang
117cd7bb8c Support whisper large/large-v1/large-v2/large-v3 and distil-large-v2 (#1114) 2024-07-12 23:47:39 +08:00
thewh1teagle
c0eaf86dbd feat: find best embedding matches (#1102) 2024-07-11 09:38:06 +08:00