Commit Graph

483 Commits

Author SHA1 Message Date
Fangjun Kuang
dc3287f3a8 Add HarmonyOS support for text-to-speech. (#1584) 2024-12-01 21:43:34 +08:00
Fangjun Kuang
315d8e2a47 Publish sherpa_onnx.har for HarmonyOS (#1572) 2024-11-28 17:30:16 +08:00
Fangjun Kuang
109fb799ca fix building for Android (#1568) 2024-11-27 10:36:16 +08:00
Fangjun Kuang
2101227269 Add streaming ASR support for HarmonyOS. (#1565) 2024-11-26 18:36:56 +08:00
Fangjun Kuang
298b6b6fda Add non-streaming ASR support for HarmonyOS. (#1564) 2024-11-26 16:38:35 +08:00
Fangjun Kuang
31d6206fde HarmonyOS support for VAD. (#1561) 2024-11-24 16:29:24 +08:00
Fangjun Kuang
f97daed408 Fixes #1512 (#1522) 2024-11-08 21:07:36 +08:00
Fangjun Kuang
4eeb336f59 Export the English TTS model from MeloTTS (#1509) 2024-11-04 07:54:19 +08:00
Fangjun Kuang
6ee8c99c5d Fix building (#1508) 2024-11-03 19:47:04 +08:00
Fangjun Kuang
9ab89c33bc Support building GPU-capable sherpa-onnx on Linux aarch64. (#1500)
Thanks to @Peakyxh for providing pre-built onnxruntime libraries 
with CUDA support for Linux aarch64.

Tested on Jetson nano b01
2024-11-01 11:16:28 +08:00
Fangjun Kuang
a3c89aa0d8 Add two-pass ASR Android APKs for Moonshine models. (#1499) 2024-10-31 17:54:16 +08:00
Fangjun Kuang
9fa3bc40d7 Fix reading tokens.txt on Windows. (#1497) 2024-10-30 12:13:11 +08:00
Fangjun Kuang
356da3b54c Publish pre-built macos xcframework (#1490) 2024-10-29 12:26:26 +08:00
Fangjun Kuang
cdd8e1bbcb Add Pascal API for Moonshine models (#1482) 2024-10-27 12:21:16 +08:00
Fangjun Kuang
2ca2985d04 Add C and C++ API for Moonshine models (#1476) 2024-10-26 23:24:46 +08:00
Fangjun Kuang
bd4b223920 Add Kotlin and Java API for Moonshine models (#1474) 2024-10-26 22:30:29 +08:00
Fangjun Kuang
669f5ef441 Add C++ runtime and Python APIs for Moonshine models (#1473) 2024-10-26 14:34:07 +08:00
Fangjun Kuang
707cf792c5 Add GigaAM NeMo transducer model for Russian ASR (#1467) 2024-10-25 15:20:13 +08:00
Fangjun Kuang
b41f6d2c94 Support GigaAM CTC models for Russian ASR (#1464)
See also https://github.com/salute-developers/GigaAM
2024-10-25 10:55:16 +08:00
Fangjun Kuang
a5295aad10 Handle NaN embeddings in speaker diarization. (#1461)
See also https://github.com/thewh1teagle/sherpa-rs/issues/33
2024-10-24 14:03:09 +08:00
Fangjun Kuang
b3e05f6dc4 Fix style issues (#1458) 2024-10-24 11:15:08 +08:00
Fangjun Kuang
ceb69ebd94 Add C++ API for non-streaming ASR (#1456) 2024-10-23 16:40:12 +08:00
Fangjun Kuang
effd5ef2be Add C++ API for streaming ASR. (#1455)
It is a wrapper around the C API.
2024-10-23 12:07:43 +08:00
YeyuchenBa
bcaa91ed36 update java for hotword jar (#1444)
---------

Co-authored-by: root <1552138571@qq.com>
2024-10-18 18:07:51 +08:00
Zazzle516
4783c8f590 fix "log10" compile error by import CMATH lib (#1438) 2024-10-17 14:50:04 +08:00
Fangjun Kuang
5a22f74b2b Android demo for speaker diarization (#1423) 2024-10-13 14:02:57 +08:00
Fangjun Kuang
94b26ff07c Android JNI support for speaker diarization (#1421) 2024-10-12 13:03:48 +08:00
Fangjun Kuang
5e273c5be4 Pascal API for speaker diarization (#1420) 2024-10-12 12:28:38 +08:00
Fangjun Kuang
1ed803adc1 Dart API for speaker diarization (#1418) 2024-10-11 21:17:41 +08:00
Fangjun Kuang
1851ff6337 Java API for speaker diarization (#1416) 2024-10-11 16:51:40 +08:00
Fangjun Kuang
2d412b1190 Kotlin API for speaker diarization (#1415) 2024-10-11 14:41:53 +08:00
Fangjun Kuang
f1b311ee4f Handle audio files less than 10s long for speaker diarization. (#1412)
If the input audio file is less than 10 seconds long, there is only 
one chunk, and there is no need to compute embeddings or 
do clustering.

We can use the segmentation result from the speaker segmentation 
model directly.
2024-10-11 10:27:16 +08:00
Fangjun Kuang
1d061df355 WebAssembly exmaple for speaker diarization (#1411) 2024-10-10 22:14:45 +08:00
Fangjun Kuang
d468527f62 C API for speaker diarization (#1402) 2024-10-09 17:10:03 +08:00
Fangjun Kuang
8535b1d3bb Python API for speaker diarization. (#1400) 2024-10-09 14:13:26 +08:00
Fangjun Kuang
59407edcad C++ API for speaker diarization (#1396) 2024-10-09 12:01:20 +08:00
Fangjun Kuang
70165cb42d Speaker diarization example with onnxruntime Python API (#1395) 2024-10-06 16:37:29 +08:00
Askars
5f50cbf65a context_state is not set correctly when previous context is passed after reset (#1393)
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv>
2024-10-03 16:42:09 +08:00
Fangjun Kuang
b965f14cf0 Add Python API for clustering (#1385) 2024-09-30 11:33:15 +08:00
Fangjun Kuang
70568c2df7 Support Agglomerative clustering. (#1384)
We use the open-source implementation from
https://github.com/cdalitz/hclust-cpp
2024-09-29 23:44:29 +08:00
Fangjun Kuang
11f0cb7e1c Support Parakeet models from NeMo (#1381) 2024-09-27 17:12:00 +08:00
lxiao336
06b61ccad8 Allow more online models to load tokens file from the memory (#1352)
Co-authored-by: xiao <shawl336@6163.com>
2024-09-20 16:38:41 +08:00
Fangjun Kuang
576a3aa90d Add non-streaming ONNX models for Russian ASR (#1358) 2024-09-18 13:43:49 +08:00
Fangjun Kuang
e7ffcbd677 Add APIs about max speech duration in VAD for various programming languages (#1349) 2024-09-14 12:30:13 +08:00
Fangjun Kuang
1423ddb1f0 Support specifying max speech duration for VAD. (#1348) 2024-09-14 10:57:46 +08:00
Fangjun Kuang
544857b097 Fix building (#1343) 2024-09-13 13:33:52 +08:00
lxiao336
65cfa7548a re-pull-request allow tokens and hotwords be loaded from buffered string driectly (#1339)
Co-authored-by: xiao <shawl336@163.com>
2024-09-13 09:58:17 +08:00
Fangjun Kuang
6b6e7635ed Fix computing features for CED audio tagging models. (#1341)
See also
https://github.com/RicherMans/CED/blob/main/onnx_inference_with_kaldi.py
2024-09-12 19:38:18 +08:00
Askars
fa20ae1552 Preserve previous result as context for next segment (#1335)
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv>
2024-09-11 10:44:13 +08:00
Fangjun Kuang
ba7f1a7439 Fix building (#1331) 2024-09-09 10:29:31 +08:00