Fangjun Kuang
707cf792c5
Add GigaAM NeMo transducer model for Russian ASR ( #1467 )
2024-10-25 15:20:13 +08:00
Fangjun Kuang
b41f6d2c94
Support GigaAM CTC models for Russian ASR ( #1464 )
...
See also https://github.com/salute-developers/GigaAM
2024-10-25 10:55:16 +08:00
Fangjun Kuang
a5295aad10
Handle NaN embeddings in speaker diarization. ( #1461 )
...
See also https://github.com/thewh1teagle/sherpa-rs/issues/33
2024-10-24 14:03:09 +08:00
Fangjun Kuang
b3e05f6dc4
Fix style issues ( #1458 )
2024-10-24 11:15:08 +08:00
Fangjun Kuang
ceb69ebd94
Add C++ API for non-streaming ASR ( #1456 )
2024-10-23 16:40:12 +08:00
Zazzle516
4783c8f590
fix "log10" compile error by import CMATH lib ( #1438 )
2024-10-17 14:50:04 +08:00
Fangjun Kuang
94b26ff07c
Android JNI support for speaker diarization ( #1421 )
2024-10-12 13:03:48 +08:00
Fangjun Kuang
1ed803adc1
Dart API for speaker diarization ( #1418 )
2024-10-11 21:17:41 +08:00
Fangjun Kuang
2d412b1190
Kotlin API for speaker diarization ( #1415 )
2024-10-11 14:41:53 +08:00
Fangjun Kuang
f1b311ee4f
Handle audio files less than 10s long for speaker diarization. ( #1412 )
...
If the input audio file is less than 10 seconds long, there is only
one chunk, and there is no need to compute embeddings or
do clustering.
We can use the segmentation result from the speaker segmentation
model directly.
2024-10-11 10:27:16 +08:00
Fangjun Kuang
1d061df355
WebAssembly exmaple for speaker diarization ( #1411 )
2024-10-10 22:14:45 +08:00
Fangjun Kuang
d468527f62
C API for speaker diarization ( #1402 )
2024-10-09 17:10:03 +08:00
Fangjun Kuang
8535b1d3bb
Python API for speaker diarization. ( #1400 )
2024-10-09 14:13:26 +08:00
Fangjun Kuang
59407edcad
C++ API for speaker diarization ( #1396 )
2024-10-09 12:01:20 +08:00
Fangjun Kuang
70165cb42d
Speaker diarization example with onnxruntime Python API ( #1395 )
2024-10-06 16:37:29 +08:00
Askars
5f50cbf65a
context_state is not set correctly when previous context is passed after reset ( #1393 )
...
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv >
2024-10-03 16:42:09 +08:00
Fangjun Kuang
b965f14cf0
Add Python API for clustering ( #1385 )
2024-09-30 11:33:15 +08:00
Fangjun Kuang
70568c2df7
Support Agglomerative clustering. ( #1384 )
...
We use the open-source implementation from
https://github.com/cdalitz/hclust-cpp
2024-09-29 23:44:29 +08:00
Fangjun Kuang
11f0cb7e1c
Support Parakeet models from NeMo ( #1381 )
2024-09-27 17:12:00 +08:00
lxiao336
06b61ccad8
Allow more online models to load tokens file from the memory ( #1352 )
...
Co-authored-by: xiao <shawl336@6163.com >
2024-09-20 16:38:41 +08:00
Fangjun Kuang
1423ddb1f0
Support specifying max speech duration for VAD. ( #1348 )
2024-09-14 10:57:46 +08:00
Fangjun Kuang
544857b097
Fix building ( #1343 )
2024-09-13 13:33:52 +08:00
lxiao336
65cfa7548a
re-pull-request allow tokens and hotwords be loaded from buffered string driectly ( #1339 )
...
Co-authored-by: xiao <shawl336@163.com >
2024-09-13 09:58:17 +08:00
Fangjun Kuang
6b6e7635ed
Fix computing features for CED audio tagging models. ( #1341 )
...
See also
https://github.com/RicherMans/CED/blob/main/onnx_inference_with_kaldi.py
2024-09-12 19:38:18 +08:00
Askars
fa20ae1552
Preserve previous result as context for next segment ( #1335 )
...
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv >
2024-09-11 10:44:13 +08:00
Fangjun Kuang
ba7f1a7439
Fix building ( #1331 )
2024-09-09 10:29:31 +08:00
Fangjun Kuang
363b8e4c1e
Fix vad.Flush(). ( #1329 )
...
Fixes #1314
2024-09-08 17:52:53 +08:00
SilverSulfide
888f74bf3c
Re-implement LM rescore for online transducer ( #1231 )
...
Co-authored-by: Martins Kronis <martins.kuznecovs@tilde.lv >
2024-09-06 10:01:25 +08:00
Fangjun Kuang
3687c9f60a
Reduce onnxruntime log output. ( #1306 )
...
Change the logging level from WARNING to ERROR.
2024-08-30 12:50:34 +08:00
Malcolm Ke Win
c61423ec5a
Update wave-reader.cc ( #1278 )
...
* Update wave-reader.cc
missing "#include <cstdint>"
2024-08-22 23:22:45 +08:00
Fangjun Kuang
f93f0ca94d
Use a separate thread to initialize models for lazarus examples. ( #1270 )
...
So that the main thread is not blocked and the user interface is responsive.
2024-08-18 14:59:48 +08:00
Fangjun Kuang
9dcea49dba
Fix looking up OOVs in lexicon.txt for MeloTTS models. ( #1266 )
...
If an English word does not exist in the lexicon, we split
it into characters. For instance, if the word TTS does not
exist in lexicon.txt, we split it into 3 characters T, T, and S.
2024-08-16 22:10:03 +08:00
Ikko Eltociear Ashimine
a3e98750e9
chore: update online-stream.h ( #1264 )
...
Fix typos.
2024-08-16 15:17:15 +08:00
Fangjun Kuang
fbe35ba736
Add Lazarus example for generating subtitles using Silero VAD with non-streaming ASR ( #1251 )
2024-08-15 22:19:45 +08:00
Fangjun Kuang
ca729faebf
Support reading multi-channel wave files with 8/16/32-bit encoded samples ( #1258 )
2024-08-15 14:54:43 +08:00
Robin Zhong
62c4d4ab62
Add emotion, event of SenseVoice. ( #1257 )
...
* Add emotion, event of SenseVoice.
* Fix tokens size check and update java api.
https://github.com/k2-fsa/sherpa-onnx/pull/1257
2024-08-14 15:50:13 +08:00
Fangjun Kuang
619279b162
Pascal API for VAD ( #1249 )
2024-08-13 16:16:51 +08:00
Fangjun Kuang
8a5f5c1999
Fix python two pass ASR examples ( #1230 )
2024-08-07 18:35:38 +08:00
Fangjun Kuang
375c055ff8
Fix style issues for online punctuation source files ( #1225 )
2024-08-06 17:43:24 +08:00
jianyou
1414e4dc61
Add online punctuation and casing prediction model for English language ( #1224 )
2024-08-06 17:33:38 +08:00
Fangjun Kuang
9caa488019
Fix setting SenseVoice language. ( #1214 )
2024-08-04 19:02:23 +08:00
Fangjun Kuang
d5f486878d
Remove libonnxruntime_providers_cuda.so as a dependency. ( #1210 )
2024-08-03 16:25:23 +08:00
Fangjun Kuang
53484fcd9b
Fix reading non-standard wav files. ( #1199 )
2024-08-01 17:48:04 +08:00
Fangjun Kuang
86b4c9f535
Fix splitting sentences for MeloTTS ( #1186 )
2024-07-29 17:04:45 +08:00
Fangjun Kuang
994c3e7c96
Add VAD + Non-streaming ASR example for JavaScript API. ( #1170 )
2024-07-26 12:42:08 +08:00
Fangjun Kuang
299f1a852b
Fix style issues reported by clang-tidy ( #1167 )
2024-07-23 09:26:36 +08:00
thewh1teagle
d32a46169f
feat: add directml support ( #1153 )
2024-07-22 23:50:48 +08:00
Fangjun Kuang
1a471595a5
Fix Android build ( #1161 )
2024-07-22 09:27:30 +08:00
Fangjun Kuang
ffdb23a8ec
Add dart API for SenseVoice ( #1159 )
2024-07-21 21:48:12 +08:00
Fangjun Kuang
25f0a10468
Add C++ runtime for SenseVoice models ( #1148 )
2024-07-18 22:54:18 +08:00