Commit Graph

425 Commits

Author SHA1 Message Date
Fangjun Kuang
d468527f62 C API for speaker diarization (#1402) 2024-10-09 17:10:03 +08:00
Fangjun Kuang
8535b1d3bb Python API for speaker diarization. (#1400) 2024-10-09 14:13:26 +08:00
Fangjun Kuang
59407edcad C++ API for speaker diarization (#1396) 2024-10-09 12:01:20 +08:00
Fangjun Kuang
70165cb42d Speaker diarization example with onnxruntime Python API (#1395) 2024-10-06 16:37:29 +08:00
Askars
5f50cbf65a context_state is not set correctly when previous context is passed after reset (#1393)
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv>
2024-10-03 16:42:09 +08:00
Fangjun Kuang
b965f14cf0 Add Python API for clustering (#1385) 2024-09-30 11:33:15 +08:00
Fangjun Kuang
70568c2df7 Support Agglomerative clustering. (#1384)
We use the open-source implementation from
https://github.com/cdalitz/hclust-cpp
2024-09-29 23:44:29 +08:00
Fangjun Kuang
11f0cb7e1c Support Parakeet models from NeMo (#1381) 2024-09-27 17:12:00 +08:00
lxiao336
06b61ccad8 Allow more online models to load tokens file from the memory (#1352)
Co-authored-by: xiao <shawl336@6163.com>
2024-09-20 16:38:41 +08:00
Fangjun Kuang
1423ddb1f0 Support specifying max speech duration for VAD. (#1348) 2024-09-14 10:57:46 +08:00
Fangjun Kuang
544857b097 Fix building (#1343) 2024-09-13 13:33:52 +08:00
lxiao336
65cfa7548a re-pull-request allow tokens and hotwords be loaded from buffered string driectly (#1339)
Co-authored-by: xiao <shawl336@163.com>
2024-09-13 09:58:17 +08:00
Fangjun Kuang
6b6e7635ed Fix computing features for CED audio tagging models. (#1341)
See also
https://github.com/RicherMans/CED/blob/main/onnx_inference_with_kaldi.py
2024-09-12 19:38:18 +08:00
Askars
fa20ae1552 Preserve previous result as context for next segment (#1335)
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv>
2024-09-11 10:44:13 +08:00
Fangjun Kuang
ba7f1a7439 Fix building (#1331) 2024-09-09 10:29:31 +08:00
Fangjun Kuang
363b8e4c1e Fix vad.Flush(). (#1329)
Fixes #1314
2024-09-08 17:52:53 +08:00
SilverSulfide
888f74bf3c Re-implement LM rescore for online transducer (#1231)
Co-authored-by: Martins Kronis <martins.kuznecovs@tilde.lv>
2024-09-06 10:01:25 +08:00
Fangjun Kuang
3687c9f60a Reduce onnxruntime log output. (#1306)
Change the logging level from WARNING to ERROR.
2024-08-30 12:50:34 +08:00
Malcolm Ke Win
c61423ec5a Update wave-reader.cc (#1278)
* Update wave-reader.cc

missing "#include <cstdint>"
2024-08-22 23:22:45 +08:00
Fangjun Kuang
f93f0ca94d Use a separate thread to initialize models for lazarus examples. (#1270)
So that the main thread is not blocked and the user interface is responsive.
2024-08-18 14:59:48 +08:00
Fangjun Kuang
9dcea49dba Fix looking up OOVs in lexicon.txt for MeloTTS models. (#1266)
If an English word does not exist in the lexicon, we split
it into characters. For instance, if the word TTS does not
exist in lexicon.txt, we split it into 3 characters T, T, and S.
2024-08-16 22:10:03 +08:00
Ikko Eltociear Ashimine
a3e98750e9 chore: update online-stream.h (#1264)
Fix typos.
2024-08-16 15:17:15 +08:00
Fangjun Kuang
fbe35ba736 Add Lazarus example for generating subtitles using Silero VAD with non-streaming ASR (#1251) 2024-08-15 22:19:45 +08:00
Fangjun Kuang
ca729faebf Support reading multi-channel wave files with 8/16/32-bit encoded samples (#1258) 2024-08-15 14:54:43 +08:00
Robin Zhong
62c4d4ab62 Add emotion, event of SenseVoice. (#1257)
* Add emotion, event of SenseVoice.

* Fix tokens size check and update java api.

https://github.com/k2-fsa/sherpa-onnx/pull/1257
2024-08-14 15:50:13 +08:00
Fangjun Kuang
619279b162 Pascal API for VAD (#1249) 2024-08-13 16:16:51 +08:00
Fangjun Kuang
8a5f5c1999 Fix python two pass ASR examples (#1230) 2024-08-07 18:35:38 +08:00
Fangjun Kuang
375c055ff8 Fix style issues for online punctuation source files (#1225) 2024-08-06 17:43:24 +08:00
jianyou
1414e4dc61 Add online punctuation and casing prediction model for English language (#1224) 2024-08-06 17:33:38 +08:00
Fangjun Kuang
9caa488019 Fix setting SenseVoice language. (#1214) 2024-08-04 19:02:23 +08:00
Fangjun Kuang
d5f486878d Remove libonnxruntime_providers_cuda.so as a dependency. (#1210) 2024-08-03 16:25:23 +08:00
Fangjun Kuang
53484fcd9b Fix reading non-standard wav files. (#1199) 2024-08-01 17:48:04 +08:00
Fangjun Kuang
86b4c9f535 Fix splitting sentences for MeloTTS (#1186) 2024-07-29 17:04:45 +08:00
Fangjun Kuang
994c3e7c96 Add VAD + Non-streaming ASR example for JavaScript API. (#1170) 2024-07-26 12:42:08 +08:00
Fangjun Kuang
299f1a852b Fix style issues reported by clang-tidy (#1167) 2024-07-23 09:26:36 +08:00
thewh1teagle
d32a46169f feat: add directml support (#1153) 2024-07-22 23:50:48 +08:00
Fangjun Kuang
1a471595a5 Fix Android build (#1161) 2024-07-22 09:27:30 +08:00
Fangjun Kuang
ffdb23a8ec Add dart API for SenseVoice (#1159) 2024-07-21 21:48:12 +08:00
Fangjun Kuang
25f0a10468 Add C++ runtime for SenseVoice models (#1148) 2024-07-18 22:54:18 +08:00
Wei Kang
5b1fa8750f Fix hotwords OOV log (#1139) 2024-07-16 19:41:31 +08:00
Fangjun Kuang
960eb7529e Add C++ runtime for MeloTTS (#1138) 2024-07-16 15:55:02 +08:00
Manickavela
11cfd33b10 encoder only trt ep for transducer (#1130) 2024-07-15 14:52:33 +08:00
ivan provalov
de04b3b9bf Allow modify model config at decode time for ASR (#1124) 2024-07-13 22:30:47 +08:00
Fangjun Kuang
b5093e27f9 Fix publishing apks to huggingface (#1121)
Save APKs for each release in a separate directory.

Huggingface requires that each directory cannot contain more than 1000 files.

Since we have so many tts models and for each model we need to build APKs of 4 different ABIs,
it is a workaround for the huggingface's constraint by placing them into separate directories for different releases.
2024-07-13 16:14:00 +08:00
Fangjun Kuang
117cd7bb8c Support whisper large/large-v1/large-v2/large-v3 and distil-large-v2 (#1114) 2024-07-12 23:47:39 +08:00
thewh1teagle
c0eaf86dbd feat: find best embedding matches (#1102) 2024-07-11 09:38:06 +08:00
Fangjun Kuang
c2cc9dec58 Add Flush to VAD so that the last segment can be detected. (#1099) 2024-07-09 16:15:56 +08:00
Manix
3e4307e2fb updating trt workspace int64 (#1094)
Signed-off-by: Manix <manickavela1998@gmail.com>
2024-07-08 20:38:16 +08:00
Manix
d6fbecd947 parse option in64_t (#1089)
Signed-off-by: Manix <manickavela1998@gmail.com>
2024-07-08 15:37:30 +08:00
Fangjun Kuang
a25075101c Build sherpa-onnx as a single shared library (#1078)
When `-D BUILD_SHARED_LIBS=ON` is passed to `cmake`, it builds a single shared library.

Specifically, 

- For C APIs, it builds `libsherpa-onnx-c-api.so`
- For Python APIs, it builds `_sherpa_onnx.cpython-xx-xx.so`
- For Kotlin and Java APIs, it builds `libsherpa-onnx-jni.so`

There is no `libsherpa-onnx-core.so` any longer.

Note it affects only shared libraries.
2024-07-06 16:41:54 +08:00