enginex-mr_series-sherpa-onnx

EngineX-Iluvatar/enginex-mr_series-sherpa-onnx

Archived

Author	SHA1	Message	Date
Fangjun Kuang	2e0bccad36	Add C API for speaker embedding extractor. (#711 )	2024-03-28 18:05:40 +08:00
Leo Huang	638f48f47a	Added progress for callback of tts generator (#712 ) Co-authored-by: leohwang <leohwang@360converter.com>	2024-03-28 17:12:20 +08:00
Fangjun Kuang	a042f44076	Add Golang API for spoken language identification. (#709 )	2024-03-27 19:40:25 +08:00
Fangjun Kuang	4e040c596e	Support including TTS conditionally. (#699 )	2024-03-26 17:21:35 +08:00
Fangjun Kuang	d364610605	Use a single thread when loading models (#703 )	2024-03-26 13:35:33 +08:00
Fangjun Kuang	0d258dd150	Support spoken language identification with whisper (#694 )	2024-03-24 22:57:00 +08:00
Fangjun Kuang	1952772654	Add timestamps and tokens for .Net's online models. (#690 )	2024-03-23 18:51:56 +08:00
Karel Vesely	eaec4c83c2	Configurable low_freq high_freq, dithering (#664 )	2024-03-22 21:41:44 +08:00
Fangjun Kuang	c8770aec20	Add nuget package for Windows x86 (#683 )	2024-03-21 14:57:01 +08:00
Fangjun Kuang	acf0975153	Support whisper language/task in various language bindings. (#679 )	2024-03-20 16:43:35 +08:00
Lovemefan	009ed2cd30	add WebAssembly for Kws (#648 )	2024-03-11 21:02:31 +08:00
Fangjun Kuang	d3287f9494	Add Python ASR examples with alsa (#646 )	2024-03-08 11:34:48 +08:00
Wei Kang	e9e8d755d9	Fix detetion at the tail when using hotwords in streaming model (#638 )	2024-03-08 10:04:33 +08:00
Fangjun Kuang	bdf9243940	Allow to not use pre-installed onnxruntime libs. (#636 )	2024-03-06 14:40:23 +08:00
Fangjun Kuang	d56964371c	Support VITS models from icefall. (#625 )	2024-03-01 19:48:38 +08:00
Fangjun Kuang	e2397cd1a4	Support Android NNAPI. (#622 )	2024-03-01 16:39:48 +08:00
Wei Kang	734bbd91dc	Add Python API for keyword spotting (#576 ) * Add alsa & microphone support for keyword spotting * Add python wrapper	2024-03-01 09:31:11 +08:00
Karel Vesely	38c072dcb2	Track token scores (#571 ) * add export of per-token scores (ys, lm, context) - for best path of the modified-beam-search decoding of transducer * refactoring JSON export of OnlineRecognitionResult, extending pybind11 API of OnlineRecognitionResult * export per-token scores also for greedy-search (online-transducer) - export un-scaled lm_probs (modified-beam search, online-transducer) - polishing * fill lm_probs/context_scores only if LM/ContextGraph is present (make Result smaller)	2024-02-29 06:28:45 +08:00
Fangjun Kuang	0cb6d1b474	support using xnnpack as execution provider (#612 )	2024-02-28 17:32:48 +08:00
Fangjun Kuang	87a7030c08	Support using alsa to access the microphone with non-streaming ASR models (#517 )	2024-02-26 21:17:26 +08:00
Fangjun Kuang	67acd34dcd	Use alsa to read microphone in speaker identification demo. (#605 )	2024-02-23 19:27:51 +08:00
Fangjun Kuang	16ba7e274a	Add WebAssembly for ASR (#604 )	2024-02-23 17:39:11 +08:00
Fangjun Kuang	099a0ccae3	Link the math lib. (#592 )	2024-02-21 15:36:54 +08:00
Fangjun Kuang	3d2c7fad74	Increase the right chunk size of streaming paraformer to 3 (#588 )	2024-02-20 09:44:40 +08:00
Fangjun Kuang	d771762868	Support WebAssembly for text-to-speech (#577 )	2024-02-08 23:39:12 +08:00
Fangjun Kuang	0b18ccfbb2	C++ API demo for speaker identification with portaudio. (#561 )	2024-01-30 11:21:43 +08:00
Fangjun Kuang	fa2af5dc69	Add TTS demo for C# API (#557 )	2024-01-28 23:29:39 +08:00
Karel Vesely	3f2a17ef47	Fixes issue #535 , fix hexa 1-char tokens in ASR output. (#550 ) - Avoid output like : `[' K', '<0x64>', '<0x79>', 'ť', ' a', '<0x75>', 'to', 'bu', '<0x73>', '<0x75>', ... ]` with regular 500 BPE units. - Don't rewrite 1-char tokens in range [ 0x20 (space) .. 0x7E (tilde) ]	2024-01-26 19:23:20 +08:00
chiiyeh	e7b18a2139	add blank_penalty for online transducer (#548 )	2024-01-26 12:12:13 +08:00
chiiyeh	3bb3849ec5	add blank_penalty for offline transducer (#542 )	2024-01-25 15:00:09 +08:00
Fangjun Kuang	bbd7c7fc18	Add Android demo for speaker recognition (#536 ) See pre-built Android APKs at https://k2-fsa.github.io/sherpa/onnx/speaker-identification/apk.html	2024-01-23 16:50:52 +08:00
Wei Kang	b6c020901a	decoder for open vocabulary keyword spotting (#505 ) * various fixes to ContextGraph to support open vocabulary keywords decoder * Add keyword spotter runtime * Add binary * First version works * Minor fixes * update text2token * default values * Add jni for kws * add kws android project * Minor fixes * Remove unused interface * Minor fixes * Add workflow * handle extra info in texts * Minor fixes * Add more comments * Fix ci * fix cpp style * Add input box in android demo so that users can specify their keywords * Fix cpp style * Fix comments * Minor fixes * Minor fixes * minor fixes * Minor fixes * Minor fixes * Add CI * Fix code style * cpplint * Fix comments * Fix error	2024-01-20 22:52:41 +08:00
Fangjun Kuang	2024e96639	Add C++ runtime for speaker verification models from NeMo (#527 )	2024-01-13 21:42:09 +08:00
Fangjun Kuang	afc81ec122	Add C++ runtime for models from 3d-speaker (#523 )	2024-01-11 19:10:30 +08:00
Fangjun Kuang	07e2b9a36d	Support exporting models to onnx from 3D-Speaker (#522 )	2024-01-10 21:09:45 +08:00
Fangjun Kuang	55266918c8	Add runtime support for wespeaker models (#516 )	2024-01-09 22:06:08 +08:00
Fangjun Kuang	0be71a31f5	Use high_freq -400 in computing fbank features. (#515 ) Fixes #514	2024-01-04 12:39:06 +08:00
Fangjun Kuang	e215d0c39a	Fix Byte BPE string results for Python. (#512 ) It ignores invalid UTF8 strings.	2024-01-03 16:03:24 +08:00
Fangjun Kuang	d7e10bb3f8	Replace Android system TTS engine (#508 )	2023-12-31 23:02:35 +08:00
Fangjun Kuang	e475e750ac	Support streaming zipformer CTC (#496 ) * Support streaming zipformer CTC * test online zipformer2 CTC * Update doc of sherpa-onnx.cc * Add Python APIs for streaming zipformer2 ctc * Add Python API examples for streaming zipformer2 ctc * Swift API for streaming zipformer2 CTC * NodeJS API for streaming zipformer2 CTC * Kotlin API for streaming zipformer2 CTC * Golang API for streaming zipformer2 CTC * C# API for streaming zipformer2 CTC * Release v1.9.6	2023-12-22 13:46:33 +08:00
Fangjun Kuang	03ff9db56e	Keep multiple threads from calling into espeak-ng at the same time (#489 )	2023-12-15 17:44:33 +08:00
Fangjun Kuang	ad72e7afc3	Print informative error messages for sherpa-onnx-alsa on errors. (#486 )	2023-12-15 11:10:39 +08:00
Fangjun Kuang	b18812ceff	Play generated audio using alsa for TTS (#482 )	2023-12-13 22:28:03 +08:00
Fangjun Kuang	0e23f82691	Give an informative log for whisper on exceptions. (#473 )	2023-12-08 14:33:59 +08:00
Fangjun Kuang	868c339e5e	Support distil-small.en whisper (#472 )	2023-12-08 11:59:20 +08:00
Fangjun Kuang	3ae984f148	Remove the 30-second constraint from whisper. (#471 )	2023-12-07 17:47:08 +08:00
Fangjun Kuang	d34161413d	Support Ukrainian VITS models from coqui-ai/TTS (#469 )	2023-12-06 19:37:11 +08:00
Fangjun Kuang	23cf92daf7	Use espeak-ng for coqui-ai/TTS VITS English models. (#466 )	2023-12-06 11:00:38 +08:00
Fangjun Kuang	86b4be5260	Break text into sentences for tts. (#460 ) This is for models that are not using piper-phonemize as their front-end.	2023-12-03 11:50:25 +08:00
Fangjun Kuang	99ff6a834c	Play generated audio as it is generating. (#457 )	2023-12-02 15:35:11 +08:00

1 2 3 4 5

209 Commits