Commit Graph

216 Commits

Author SHA1 Message Date
AHN Sung Hwan
904a3cc8a9 Fix a bug in mean calculation of 'ys_probs' (#748) 2024-04-11 10:34:44 +08:00
Fangjun Kuang
042976ea6e Add C++ microphone examples for audio tagging (#749) 2024-04-10 21:00:35 +08:00
Fangjun Kuang
f20291cadc Support audio tagging using zipformer (#747) 2024-04-10 14:47:06 +08:00
Fangjun Kuang
0d90b34e4a Support Chinese heteronyms on Android for TTS. (#742) 2024-04-08 21:36:47 +08:00
Fangjun Kuang
6fb8ceda57 Add VAD examples using ALSA for recording (#739) 2024-04-08 16:41:01 +08:00
Fangjun Kuang
a5f8fbc83f Support heteronyms in Chinese TTS (#738) 2024-04-08 11:01:30 +08:00
Fangjun Kuang
db67e00c77 Add HLG decoding for streaming CTC models (#731) 2024-04-03 21:31:42 +08:00
Fangjun Kuang
2e0bccad36 Add C API for speaker embedding extractor. (#711) 2024-03-28 18:05:40 +08:00
Leo Huang
638f48f47a Added progress for callback of tts generator (#712)
Co-authored-by: leohwang <leohwang@360converter.com>
2024-03-28 17:12:20 +08:00
Fangjun Kuang
a042f44076 Add Golang API for spoken language identification. (#709) 2024-03-27 19:40:25 +08:00
Fangjun Kuang
4e040c596e Support including TTS conditionally. (#699) 2024-03-26 17:21:35 +08:00
Fangjun Kuang
d364610605 Use a single thread when loading models (#703) 2024-03-26 13:35:33 +08:00
Fangjun Kuang
0d258dd150 Support spoken language identification with whisper (#694) 2024-03-24 22:57:00 +08:00
Fangjun Kuang
1952772654 Add timestamps and tokens for .Net's online models. (#690) 2024-03-23 18:51:56 +08:00
Karel Vesely
eaec4c83c2 Configurable low_freq high_freq, dithering (#664) 2024-03-22 21:41:44 +08:00
Fangjun Kuang
c8770aec20 Add nuget package for Windows x86 (#683) 2024-03-21 14:57:01 +08:00
Fangjun Kuang
acf0975153 Support whisper language/task in various language bindings. (#679) 2024-03-20 16:43:35 +08:00
Lovemefan
009ed2cd30 add WebAssembly for Kws (#648) 2024-03-11 21:02:31 +08:00
Fangjun Kuang
d3287f9494 Add Python ASR examples with alsa (#646) 2024-03-08 11:34:48 +08:00
Wei Kang
e9e8d755d9 Fix detetion at the tail when using hotwords in streaming model (#638) 2024-03-08 10:04:33 +08:00
Fangjun Kuang
bdf9243940 Allow to not use pre-installed onnxruntime libs. (#636) 2024-03-06 14:40:23 +08:00
Fangjun Kuang
d56964371c Support VITS models from icefall. (#625) 2024-03-01 19:48:38 +08:00
Fangjun Kuang
e2397cd1a4 Support Android NNAPI. (#622) 2024-03-01 16:39:48 +08:00
Wei Kang
734bbd91dc Add Python API for keyword spotting (#576)
* Add alsa & microphone support for keyword spotting

* Add python wrapper
2024-03-01 09:31:11 +08:00
Karel Vesely
38c072dcb2 Track token scores (#571)
* add export of per-token scores (ys, lm, context)

- for best path of the modified-beam-search decoding of transducer

* refactoring JSON export of OnlineRecognitionResult, extending pybind11 API of OnlineRecognitionResult

* export per-token scores also for greedy-search (online-transducer)

- export un-scaled lm_probs (modified-beam search, online-transducer)
- polishing

* fill lm_probs/context_scores only if LM/ContextGraph is present (make Result smaller)
2024-02-29 06:28:45 +08:00
Fangjun Kuang
0cb6d1b474 support using xnnpack as execution provider (#612) 2024-02-28 17:32:48 +08:00
Fangjun Kuang
87a7030c08 Support using alsa to access the microphone with non-streaming ASR models (#517) 2024-02-26 21:17:26 +08:00
Fangjun Kuang
67acd34dcd Use alsa to read microphone in speaker identification demo. (#605) 2024-02-23 19:27:51 +08:00
Fangjun Kuang
16ba7e274a Add WebAssembly for ASR (#604) 2024-02-23 17:39:11 +08:00
Fangjun Kuang
099a0ccae3 Link the math lib. (#592) 2024-02-21 15:36:54 +08:00
Fangjun Kuang
3d2c7fad74 Increase the right chunk size of streaming paraformer to 3 (#588) 2024-02-20 09:44:40 +08:00
Fangjun Kuang
d771762868 Support WebAssembly for text-to-speech (#577) 2024-02-08 23:39:12 +08:00
Fangjun Kuang
0b18ccfbb2 C++ API demo for speaker identification with portaudio. (#561) 2024-01-30 11:21:43 +08:00
Fangjun Kuang
fa2af5dc69 Add TTS demo for C# API (#557) 2024-01-28 23:29:39 +08:00
Karel Vesely
3f2a17ef47 Fixes issue #535 , fix hexa 1-char tokens in ASR output. (#550)
- Avoid output like : `[' K', '<0x64>', '<0x79>', 'ť', ' a', '<0x75>',
  'to', 'bu', '<0x73>', '<0x75>', ... ]` with regular 500 BPE units.
- Don't rewrite 1-char tokens in range [ 0x20 (space) .. 0x7E (tilde) ]
2024-01-26 19:23:20 +08:00
chiiyeh
e7b18a2139 add blank_penalty for online transducer (#548) 2024-01-26 12:12:13 +08:00
chiiyeh
3bb3849ec5 add blank_penalty for offline transducer (#542) 2024-01-25 15:00:09 +08:00
Fangjun Kuang
bbd7c7fc18 Add Android demo for speaker recognition (#536)
See pre-built Android APKs at 
https://k2-fsa.github.io/sherpa/onnx/speaker-identification/apk.html
2024-01-23 16:50:52 +08:00
Wei Kang
b6c020901a decoder for open vocabulary keyword spotting (#505)
* various fixes to ContextGraph to support open vocabulary keywords decoder

* Add keyword spotter runtime

* Add binary

* First version works

* Minor fixes

* update text2token

* default values

* Add jni for kws

* add kws android project

* Minor fixes

* Remove unused interface

* Minor fixes

* Add workflow

* handle extra info in texts

* Minor fixes

* Add more comments

* Fix ci

* fix cpp style

* Add input box in android demo so that users can specify their keywords

* Fix cpp style

* Fix comments

* Minor fixes

* Minor fixes

* minor fixes

* Minor fixes

* Minor fixes

* Add CI

* Fix code style

* cpplint

* Fix comments

* Fix error
2024-01-20 22:52:41 +08:00
Fangjun Kuang
2024e96639 Add C++ runtime for speaker verification models from NeMo (#527) 2024-01-13 21:42:09 +08:00
Fangjun Kuang
afc81ec122 Add C++ runtime for models from 3d-speaker (#523) 2024-01-11 19:10:30 +08:00
Fangjun Kuang
07e2b9a36d Support exporting models to onnx from 3D-Speaker (#522) 2024-01-10 21:09:45 +08:00
Fangjun Kuang
55266918c8 Add runtime support for wespeaker models (#516) 2024-01-09 22:06:08 +08:00
Fangjun Kuang
0be71a31f5 Use high_freq -400 in computing fbank features. (#515)
Fixes #514
2024-01-04 12:39:06 +08:00
Fangjun Kuang
e215d0c39a Fix Byte BPE string results for Python. (#512)
It ignores invalid UTF8 strings.
2024-01-03 16:03:24 +08:00
Fangjun Kuang
d7e10bb3f8 Replace Android system TTS engine (#508) 2023-12-31 23:02:35 +08:00
Fangjun Kuang
e475e750ac Support streaming zipformer CTC (#496)
* Support streaming zipformer CTC

* test online zipformer2 CTC

* Update doc of sherpa-onnx.cc

* Add Python APIs for streaming zipformer2 ctc

* Add Python API examples for streaming zipformer2 ctc

* Swift API for streaming zipformer2 CTC

* NodeJS API for streaming zipformer2 CTC

* Kotlin API for streaming zipformer2 CTC

* Golang API for streaming zipformer2 CTC

* C# API for streaming zipformer2 CTC

* Release v1.9.6
2023-12-22 13:46:33 +08:00
Fangjun Kuang
03ff9db56e Keep multiple threads from calling into espeak-ng at the same time (#489) 2023-12-15 17:44:33 +08:00
Fangjun Kuang
ad72e7afc3 Print informative error messages for sherpa-onnx-alsa on errors. (#486) 2023-12-15 11:10:39 +08:00
Fangjun Kuang
b18812ceff Play generated audio using alsa for TTS (#482) 2023-12-13 22:28:03 +08:00