Fangjun Kuang
d468527f62
C API for speaker diarization ( #1402 )
2024-10-09 17:10:03 +08:00
Fangjun Kuang
8535b1d3bb
Python API for speaker diarization. ( #1400 )
2024-10-09 14:13:26 +08:00
Fangjun Kuang
59407edcad
C++ API for speaker diarization ( #1396 )
2024-10-09 12:01:20 +08:00
Fangjun Kuang
70165cb42d
Speaker diarization example with onnxruntime Python API ( #1395 )
2024-10-06 16:37:29 +08:00
Askars
5f50cbf65a
context_state is not set correctly when previous context is passed after reset ( #1393 )
...
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv >
2024-10-03 16:42:09 +08:00
Fangjun Kuang
b965f14cf0
Add Python API for clustering ( #1385 )
2024-09-30 11:33:15 +08:00
Fangjun Kuang
70568c2df7
Support Agglomerative clustering. ( #1384 )
...
We use the open-source implementation from
https://github.com/cdalitz/hclust-cpp
2024-09-29 23:44:29 +08:00
Fangjun Kuang
11f0cb7e1c
Support Parakeet models from NeMo ( #1381 )
2024-09-27 17:12:00 +08:00
lxiao336
06b61ccad8
Allow more online models to load tokens file from the memory ( #1352 )
...
Co-authored-by: xiao <shawl336@6163.com >
2024-09-20 16:38:41 +08:00
Fangjun Kuang
576a3aa90d
Add non-streaming ONNX models for Russian ASR ( #1358 )
2024-09-18 13:43:49 +08:00
Fangjun Kuang
e7ffcbd677
Add APIs about max speech duration in VAD for various programming languages ( #1349 )
2024-09-14 12:30:13 +08:00
Fangjun Kuang
1423ddb1f0
Support specifying max speech duration for VAD. ( #1348 )
2024-09-14 10:57:46 +08:00
Fangjun Kuang
544857b097
Fix building ( #1343 )
2024-09-13 13:33:52 +08:00
lxiao336
65cfa7548a
re-pull-request allow tokens and hotwords be loaded from buffered string driectly ( #1339 )
...
Co-authored-by: xiao <shawl336@163.com >
2024-09-13 09:58:17 +08:00
Fangjun Kuang
6b6e7635ed
Fix computing features for CED audio tagging models. ( #1341 )
...
See also
https://github.com/RicherMans/CED/blob/main/onnx_inference_with_kaldi.py
2024-09-12 19:38:18 +08:00
Askars
fa20ae1552
Preserve previous result as context for next segment ( #1335 )
...
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv >
2024-09-11 10:44:13 +08:00
Fangjun Kuang
ba7f1a7439
Fix building ( #1331 )
2024-09-09 10:29:31 +08:00
Lim Yao Chong
3bffc24d64
Add Python binding for online punctuation models ( #1312 )
2024-09-09 10:26:53 +08:00
Fangjun Kuang
363b8e4c1e
Fix vad.Flush(). ( #1329 )
...
Fixes #1314
2024-09-08 17:52:53 +08:00
SilverSulfide
888f74bf3c
Re-implement LM rescore for online transducer ( #1231 )
...
Co-authored-by: Martins Kronis <martins.kuznecovs@tilde.lv >
2024-09-06 10:01:25 +08:00
RGdevz
1f29e4a1a9
throw error instead exit ( #1323 )
2024-09-06 09:59:21 +08:00
Fangjun Kuang
3687c9f60a
Reduce onnxruntime log output. ( #1306 )
...
Change the logging level from WARNING to ERROR.
2024-08-30 12:50:34 +08:00
Fangjun Kuang
ca30d83915
Avoid SherpaOnnxSpeakerEmbeddingManagerFreeBestMatches freeing null. ( #1296 )
...
Fixes #1295
2024-08-28 10:42:36 +08:00
Fangjun Kuang
537e163dd0
WebAssembly example for VAD + Non-streaming ASR ( #1284 )
2024-08-24 13:24:52 +08:00
Malcolm Ke Win
c61423ec5a
Update wave-reader.cc ( #1278 )
...
* Update wave-reader.cc
missing "#include <cstdint>"
2024-08-22 23:22:45 +08:00
Robin Zhong
d8001d6edc
update kotlin api for better release native object and add user-friendly apis. ( #1275 )
2024-08-22 19:18:11 +08:00
Fangjun Kuang
5a2aa110b8
Text to speech API for Object Pascal. ( #1273 )
2024-08-20 20:52:16 +08:00
Fangjun Kuang
e34a1a2aa3
Object pascal examples for recording and playing audio with portaudio. ( #1271 )
...
The recording example can be used for speech recognition while the playing example can be used for text to speech.
The portaudio wrapper for object pascal is copied from
https://github.com/UltraStar-Deluxe/USDX/blob/master/src/lib/portaudio/portaudio.pas
2024-08-18 19:51:08 +08:00
Fangjun Kuang
f93f0ca94d
Use a separate thread to initialize models for lazarus examples. ( #1270 )
...
So that the main thread is not blocked and the user interface is responsive.
2024-08-18 14:59:48 +08:00
Fangjun Kuang
88809753ab
Release v1.10.22 ( #1267 )
2024-08-16 22:40:49 +08:00
Fangjun Kuang
9dcea49dba
Fix looking up OOVs in lexicon.txt for MeloTTS models. ( #1266 )
...
If an English word does not exist in the lexicon, we split
it into characters. For instance, if the word TTS does not
exist in lexicon.txt, we split it into 3 characters T, T, and S.
2024-08-16 22:10:03 +08:00
Ikko Eltociear Ashimine
a3e98750e9
chore: update online-stream.h ( #1264 )
...
Fix typos.
2024-08-16 15:17:15 +08:00
Fangjun Kuang
fbe35ba736
Add Lazarus example for generating subtitles using Silero VAD with non-streaming ASR ( #1251 )
2024-08-15 22:19:45 +08:00
Fangjun Kuang
ca729faebf
Support reading multi-channel wave files with 8/16/32-bit encoded samples ( #1258 )
2024-08-15 14:54:43 +08:00
Robin Zhong
62c4d4ab62
Add emotion, event of SenseVoice. ( #1257 )
...
* Add emotion, event of SenseVoice.
* Fix tokens size check and update java api.
https://github.com/k2-fsa/sherpa-onnx/pull/1257
2024-08-14 15:50:13 +08:00
ivan provalov
9f06b059d7
Update offline-recognizer.cc ( #1253 )
...
Adding setConfig method to JNI to support setting a config on the previously initialized offline-recognizer.
2024-08-13 23:04:51 +08:00
Fangjun Kuang
619279b162
Pascal API for VAD ( #1249 )
2024-08-13 16:16:51 +08:00
Fangjun Kuang
a7dc6c2c16
Pascal API for non-streaming ASR ( #1247 )
2024-08-12 23:33:35 +08:00
Fangjun Kuang
5791b695ea
Pascal API for streaming ASR ( #1246 )
2024-08-12 19:55:51 +08:00
Fangjun Kuang
65f1c0fab2
Add Pascal API for reading wave files ( #1243 )
2024-08-11 22:43:42 +08:00
Fangjun Kuang
94e256244d
Add blank penalty for various language bindings. ( #1234 )
2024-08-08 10:43:31 +08:00
Parth Khiera
ba4cb6169f
feat: addition of blank_penalty config in online_recognizer ( #1232 )
2024-08-08 09:10:17 +08:00
Fangjun Kuang
8a5f5c1999
Fix python two pass ASR examples ( #1230 )
2024-08-07 18:35:38 +08:00
xsjk
1da75ee3c0
Fix typo in offline-lm-config.cc ( #1229 )
2024-08-07 15:38:34 +08:00
Fangjun Kuang
375c055ff8
Fix style issues for online punctuation source files ( #1225 )
2024-08-06 17:43:24 +08:00
jianyou
1414e4dc61
Add online punctuation and casing prediction model for English language ( #1224 )
2024-08-06 17:33:38 +08:00
Fangjun Kuang
9caa488019
Fix setting SenseVoice language. ( #1214 )
2024-08-04 19:02:23 +08:00
Fangjun Kuang
d5f486878d
Remove libonnxruntime_providers_cuda.so as a dependency. ( #1210 )
2024-08-03 16:25:23 +08:00
Fangjun Kuang
35c1b4a7a9
Add ReazonSpeech Japanese pre-trained model ( #1203 )
2024-08-02 10:21:24 +08:00
Fangjun Kuang
53484fcd9b
Fix reading non-standard wav files. ( #1199 )
2024-08-01 17:48:04 +08:00