enginex-mr_series-sherpa-onnx

EngineX-Iluvatar/enginex-mr_series-sherpa-onnx

Archived

Author	SHA1	Message	Date
Fangjun Kuang	d00d1c6298	Fix GitHub actions. (#1642 )	2024-12-24 11:34:35 +08:00
Fangjun Kuang	b76cd9033a	Support decoding with byte-level BPE (bbpe) models. (#1633 )	2024-12-20 19:21:32 +08:00
Fangjun Kuang	1bae4085ca	Add speaker diarization API for HarmonyOS. (#1609 )	2024-12-10 16:03:03 +08:00
Fangjun Kuang	314545f938	Add speaker identification APIs for HarmonyOS (#1607 ) * Add speaker embedding extractor API for HarmonyOS * Add ArkTS API for speaker identification	2024-12-09 19:23:18 +08:00
Fangjun Kuang	a743a4400f	Add on-device real-time ASR demo for HarmonyOS (#1606 )	2024-12-09 16:40:15 +08:00
Fangjun Kuang	74a8735f7a	Add on-device tex-to-speech (TTS) demo for HarmonyOS (#1590 )	2024-12-04 14:27:12 +08:00
Fangjun Kuang	dc3287f3a8	Add HarmonyOS support for text-to-speech. (#1584 )	2024-12-01 21:43:34 +08:00
Fangjun Kuang	109fb799ca	fix building for Android (#1568 )	2024-11-27 10:36:16 +08:00
Fangjun Kuang	2101227269	Add streaming ASR support for HarmonyOS. (#1565 )	2024-11-26 18:36:56 +08:00
Fangjun Kuang	298b6b6fda	Add non-streaming ASR support for HarmonyOS. (#1564 )	2024-11-26 16:38:35 +08:00
Fangjun Kuang	31d6206fde	HarmonyOS support for VAD. (#1561 )	2024-11-24 16:29:24 +08:00
Fangjun Kuang	f97daed408	Fixes #1512 (#1522 )	2024-11-08 21:07:36 +08:00
Fangjun Kuang	4eeb336f59	Export the English TTS model from MeloTTS (#1509 )	2024-11-04 07:54:19 +08:00
Fangjun Kuang	6ee8c99c5d	Fix building (#1508 )	2024-11-03 19:47:04 +08:00
Fangjun Kuang	9ab89c33bc	Support building GPU-capable sherpa-onnx on Linux aarch64. (#1500 ) Thanks to @Peakyxh for providing pre-built onnxruntime libraries with CUDA support for Linux aarch64. Tested on Jetson nano b01	2024-11-01 11:16:28 +08:00
Fangjun Kuang	9fa3bc40d7	Fix reading tokens.txt on Windows. (#1497 )	2024-10-30 12:13:11 +08:00
Fangjun Kuang	669f5ef441	Add C++ runtime and Python APIs for Moonshine models (#1473 )	2024-10-26 14:34:07 +08:00
Fangjun Kuang	707cf792c5	Add GigaAM NeMo transducer model for Russian ASR (#1467 )	2024-10-25 15:20:13 +08:00
Fangjun Kuang	b41f6d2c94	Support GigaAM CTC models for Russian ASR (#1464 ) See also https://github.com/salute-developers/GigaAM	2024-10-25 10:55:16 +08:00
Fangjun Kuang	a5295aad10	Handle NaN embeddings in speaker diarization. (#1461 ) See also https://github.com/thewh1teagle/sherpa-rs/issues/33	2024-10-24 14:03:09 +08:00
Fangjun Kuang	b3e05f6dc4	Fix style issues (#1458 )	2024-10-24 11:15:08 +08:00
Fangjun Kuang	ceb69ebd94	Add C++ API for non-streaming ASR (#1456 )	2024-10-23 16:40:12 +08:00
Zazzle516	4783c8f590	fix "log10" compile error by import CMATH lib (#1438 )	2024-10-17 14:50:04 +08:00
Fangjun Kuang	94b26ff07c	Android JNI support for speaker diarization (#1421 )	2024-10-12 13:03:48 +08:00
Fangjun Kuang	1ed803adc1	Dart API for speaker diarization (#1418 )	2024-10-11 21:17:41 +08:00
Fangjun Kuang	2d412b1190	Kotlin API for speaker diarization (#1415 )	2024-10-11 14:41:53 +08:00
Fangjun Kuang	f1b311ee4f	Handle audio files less than 10s long for speaker diarization. (#1412 ) If the input audio file is less than 10 seconds long, there is only one chunk, and there is no need to compute embeddings or do clustering. We can use the segmentation result from the speaker segmentation model directly.	2024-10-11 10:27:16 +08:00
Fangjun Kuang	1d061df355	WebAssembly exmaple for speaker diarization (#1411 )	2024-10-10 22:14:45 +08:00
Fangjun Kuang	d468527f62	C API for speaker diarization (#1402 )	2024-10-09 17:10:03 +08:00
Fangjun Kuang	8535b1d3bb	Python API for speaker diarization. (#1400 )	2024-10-09 14:13:26 +08:00
Fangjun Kuang	59407edcad	C++ API for speaker diarization (#1396 )	2024-10-09 12:01:20 +08:00
Fangjun Kuang	70165cb42d	Speaker diarization example with onnxruntime Python API (#1395 )	2024-10-06 16:37:29 +08:00
Askars	5f50cbf65a	context_state is not set correctly when previous context is passed after reset (#1393 ) Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv>	2024-10-03 16:42:09 +08:00
Fangjun Kuang	b965f14cf0	Add Python API for clustering (#1385 )	2024-09-30 11:33:15 +08:00
Fangjun Kuang	70568c2df7	Support Agglomerative clustering. (#1384 ) We use the open-source implementation from https://github.com/cdalitz/hclust-cpp	2024-09-29 23:44:29 +08:00
Fangjun Kuang	11f0cb7e1c	Support Parakeet models from NeMo (#1381 )	2024-09-27 17:12:00 +08:00
lxiao336	06b61ccad8	Allow more online models to load tokens file from the memory (#1352 ) Co-authored-by: xiao <shawl336@6163.com>	2024-09-20 16:38:41 +08:00
Fangjun Kuang	1423ddb1f0	Support specifying max speech duration for VAD. (#1348 )	2024-09-14 10:57:46 +08:00
Fangjun Kuang	544857b097	Fix building (#1343 )	2024-09-13 13:33:52 +08:00
lxiao336	65cfa7548a	re-pull-request allow tokens and hotwords be loaded from buffered string driectly (#1339 ) Co-authored-by: xiao <shawl336@163.com>	2024-09-13 09:58:17 +08:00
Fangjun Kuang	6b6e7635ed	Fix computing features for CED audio tagging models. (#1341 ) See also https://github.com/RicherMans/CED/blob/main/onnx_inference_with_kaldi.py	2024-09-12 19:38:18 +08:00
Askars	fa20ae1552	Preserve previous result as context for next segment (#1335 ) Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv>	2024-09-11 10:44:13 +08:00
Fangjun Kuang	ba7f1a7439	Fix building (#1331 )	2024-09-09 10:29:31 +08:00
Fangjun Kuang	363b8e4c1e	Fix vad.Flush(). (#1329 ) Fixes #1314	2024-09-08 17:52:53 +08:00
SilverSulfide	888f74bf3c	Re-implement LM rescore for online transducer (#1231 ) Co-authored-by: Martins Kronis <martins.kuznecovs@tilde.lv>	2024-09-06 10:01:25 +08:00
Fangjun Kuang	3687c9f60a	Reduce onnxruntime log output. (#1306 ) Change the logging level from WARNING to ERROR.	2024-08-30 12:50:34 +08:00
Malcolm Ke Win	c61423ec5a	Update wave-reader.cc (#1278 ) * Update wave-reader.cc missing "#include <cstdint>"	2024-08-22 23:22:45 +08:00
Fangjun Kuang	f93f0ca94d	Use a separate thread to initialize models for lazarus examples. (#1270 ) So that the main thread is not blocked and the user interface is responsive.	2024-08-18 14:59:48 +08:00
Fangjun Kuang	9dcea49dba	Fix looking up OOVs in lexicon.txt for MeloTTS models. (#1266 ) If an English word does not exist in the lexicon, we split it into characters. For instance, if the word TTS does not exist in lexicon.txt, we split it into 3 characters T, T, and S.	2024-08-16 22:10:03 +08:00
Ikko Eltociear Ashimine	a3e98750e9	chore: update online-stream.h (#1264 ) Fix typos.	2024-08-16 15:17:15 +08:00

1 2 3 4 5 ...

353 Commits