Commit Graph

602 Commits

Author SHA1 Message Date
Fangjun Kuang
f1b311ee4f Handle audio files less than 10s long for speaker diarization. (#1412)
If the input audio file is less than 10 seconds long, there is only 
one chunk, and there is no need to compute embeddings or 
do clustering.

We can use the segmentation result from the speaker segmentation 
model directly.
2024-10-11 10:27:16 +08:00
Fangjun Kuang
1d061df355 WebAssembly exmaple for speaker diarization (#1411) 2024-10-10 22:14:45 +08:00
Fangjun Kuang
d468527f62 C API for speaker diarization (#1402) 2024-10-09 17:10:03 +08:00
Fangjun Kuang
8535b1d3bb Python API for speaker diarization. (#1400) 2024-10-09 14:13:26 +08:00
Fangjun Kuang
59407edcad C++ API for speaker diarization (#1396) 2024-10-09 12:01:20 +08:00
Fangjun Kuang
70165cb42d Speaker diarization example with onnxruntime Python API (#1395) 2024-10-06 16:37:29 +08:00
Askars
5f50cbf65a context_state is not set correctly when previous context is passed after reset (#1393)
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv>
2024-10-03 16:42:09 +08:00
Fangjun Kuang
b965f14cf0 Add Python API for clustering (#1385) 2024-09-30 11:33:15 +08:00
Fangjun Kuang
70568c2df7 Support Agglomerative clustering. (#1384)
We use the open-source implementation from
https://github.com/cdalitz/hclust-cpp
2024-09-29 23:44:29 +08:00
Fangjun Kuang
11f0cb7e1c Support Parakeet models from NeMo (#1381) 2024-09-27 17:12:00 +08:00
lxiao336
06b61ccad8 Allow more online models to load tokens file from the memory (#1352)
Co-authored-by: xiao <shawl336@6163.com>
2024-09-20 16:38:41 +08:00
Fangjun Kuang
576a3aa90d Add non-streaming ONNX models for Russian ASR (#1358) 2024-09-18 13:43:49 +08:00
Fangjun Kuang
e7ffcbd677 Add APIs about max speech duration in VAD for various programming languages (#1349) 2024-09-14 12:30:13 +08:00
Fangjun Kuang
1423ddb1f0 Support specifying max speech duration for VAD. (#1348) 2024-09-14 10:57:46 +08:00
Fangjun Kuang
544857b097 Fix building (#1343) 2024-09-13 13:33:52 +08:00
lxiao336
65cfa7548a re-pull-request allow tokens and hotwords be loaded from buffered string driectly (#1339)
Co-authored-by: xiao <shawl336@163.com>
2024-09-13 09:58:17 +08:00
Fangjun Kuang
6b6e7635ed Fix computing features for CED audio tagging models. (#1341)
See also
https://github.com/RicherMans/CED/blob/main/onnx_inference_with_kaldi.py
2024-09-12 19:38:18 +08:00
Askars
fa20ae1552 Preserve previous result as context for next segment (#1335)
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv>
2024-09-11 10:44:13 +08:00
Fangjun Kuang
ba7f1a7439 Fix building (#1331) 2024-09-09 10:29:31 +08:00
Lim Yao Chong
3bffc24d64 Add Python binding for online punctuation models (#1312) 2024-09-09 10:26:53 +08:00
Fangjun Kuang
363b8e4c1e Fix vad.Flush(). (#1329)
Fixes #1314
2024-09-08 17:52:53 +08:00
SilverSulfide
888f74bf3c Re-implement LM rescore for online transducer (#1231)
Co-authored-by: Martins Kronis <martins.kuznecovs@tilde.lv>
2024-09-06 10:01:25 +08:00
RGdevz
1f29e4a1a9 throw error instead exit (#1323) 2024-09-06 09:59:21 +08:00
Fangjun Kuang
3687c9f60a Reduce onnxruntime log output. (#1306)
Change the logging level from WARNING to ERROR.
2024-08-30 12:50:34 +08:00
Fangjun Kuang
ca30d83915 Avoid SherpaOnnxSpeakerEmbeddingManagerFreeBestMatches freeing null. (#1296)
Fixes #1295
2024-08-28 10:42:36 +08:00
Fangjun Kuang
537e163dd0 WebAssembly example for VAD + Non-streaming ASR (#1284) 2024-08-24 13:24:52 +08:00
Malcolm Ke Win
c61423ec5a Update wave-reader.cc (#1278)
* Update wave-reader.cc

missing "#include <cstdint>"
2024-08-22 23:22:45 +08:00
Robin Zhong
d8001d6edc update kotlin api for better release native object and add user-friendly apis. (#1275) 2024-08-22 19:18:11 +08:00
Fangjun Kuang
5a2aa110b8 Text to speech API for Object Pascal. (#1273) 2024-08-20 20:52:16 +08:00
Fangjun Kuang
e34a1a2aa3 Object pascal examples for recording and playing audio with portaudio. (#1271)
The recording example can be used for speech recognition while the playing example can be used for text to speech.

The portaudio wrapper for object pascal is copied from
https://github.com/UltraStar-Deluxe/USDX/blob/master/src/lib/portaudio/portaudio.pas
2024-08-18 19:51:08 +08:00
Fangjun Kuang
f93f0ca94d Use a separate thread to initialize models for lazarus examples. (#1270)
So that the main thread is not blocked and the user interface is responsive.
2024-08-18 14:59:48 +08:00
Fangjun Kuang
88809753ab Release v1.10.22 (#1267) 2024-08-16 22:40:49 +08:00
Fangjun Kuang
9dcea49dba Fix looking up OOVs in lexicon.txt for MeloTTS models. (#1266)
If an English word does not exist in the lexicon, we split
it into characters. For instance, if the word TTS does not
exist in lexicon.txt, we split it into 3 characters T, T, and S.
2024-08-16 22:10:03 +08:00
Ikko Eltociear Ashimine
a3e98750e9 chore: update online-stream.h (#1264)
Fix typos.
2024-08-16 15:17:15 +08:00
Fangjun Kuang
fbe35ba736 Add Lazarus example for generating subtitles using Silero VAD with non-streaming ASR (#1251) 2024-08-15 22:19:45 +08:00
Fangjun Kuang
ca729faebf Support reading multi-channel wave files with 8/16/32-bit encoded samples (#1258) 2024-08-15 14:54:43 +08:00
Robin Zhong
62c4d4ab62 Add emotion, event of SenseVoice. (#1257)
* Add emotion, event of SenseVoice.

* Fix tokens size check and update java api.

https://github.com/k2-fsa/sherpa-onnx/pull/1257
2024-08-14 15:50:13 +08:00
ivan provalov
9f06b059d7 Update offline-recognizer.cc (#1253)
Adding setConfig method to JNI to support setting a config on the previously initialized offline-recognizer.
2024-08-13 23:04:51 +08:00
Fangjun Kuang
619279b162 Pascal API for VAD (#1249) 2024-08-13 16:16:51 +08:00
Fangjun Kuang
a7dc6c2c16 Pascal API for non-streaming ASR (#1247) 2024-08-12 23:33:35 +08:00
Fangjun Kuang
5791b695ea Pascal API for streaming ASR (#1246) 2024-08-12 19:55:51 +08:00
Fangjun Kuang
65f1c0fab2 Add Pascal API for reading wave files (#1243) 2024-08-11 22:43:42 +08:00
Fangjun Kuang
94e256244d Add blank penalty for various language bindings. (#1234) 2024-08-08 10:43:31 +08:00
Parth Khiera
ba4cb6169f feat: addition of blank_penalty config in online_recognizer (#1232) 2024-08-08 09:10:17 +08:00
Fangjun Kuang
8a5f5c1999 Fix python two pass ASR examples (#1230) 2024-08-07 18:35:38 +08:00
xsjk
1da75ee3c0 Fix typo in offline-lm-config.cc (#1229) 2024-08-07 15:38:34 +08:00
Fangjun Kuang
375c055ff8 Fix style issues for online punctuation source files (#1225) 2024-08-06 17:43:24 +08:00
jianyou
1414e4dc61 Add online punctuation and casing prediction model for English language (#1224) 2024-08-06 17:33:38 +08:00
Fangjun Kuang
9caa488019 Fix setting SenseVoice language. (#1214) 2024-08-04 19:02:23 +08:00
Fangjun Kuang
d5f486878d Remove libonnxruntime_providers_cuda.so as a dependency. (#1210) 2024-08-03 16:25:23 +08:00