Fangjun Kuang
a5f8fbc83f
Support heteronyms in Chinese TTS ( #738 )
2024-04-08 11:01:30 +08:00
Fangjun Kuang
c1c0f5bafd
return timestamps for WebAssembly ( #737 )
2024-04-05 20:24:27 +08:00
Fangjun Kuang
dbff2eaadb
Add C API for streaming HLG decoding ( #734 )
2024-04-05 10:31:20 +08:00
Fangjun Kuang
db67e00c77
Add HLG decoding for streaming CTC models ( #731 )
2024-04-03 21:31:42 +08:00
Fangjun Kuang
2e0bccad36
Add C API for speaker embedding extractor. ( #711 )
2024-03-28 18:05:40 +08:00
Leo Huang
638f48f47a
Added progress for callback of tts generator ( #712 )
...
Co-authored-by: leohwang <leohwang@360converter.com >
2024-03-28 17:12:20 +08:00
longshiming
de655e838e
delete incorrect logs ( #714 )
...
Co-authored-by: longshiming <longshiming@greesoft.com >
2024-03-28 10:49:45 +08:00
Fangjun Kuang
a042f44076
Add Golang API for spoken language identification. ( #709 )
2024-03-27 19:40:25 +08:00
Fangjun Kuang
69c7880c4d
Add Golang API for VAD ( #708 )
2024-03-27 12:09:39 +08:00
Fangjun Kuang
4e040c596e
Support including TTS conditionally. ( #699 )
2024-03-26 17:21:35 +08:00
Fangjun Kuang
d364610605
Use a single thread when loading models ( #703 )
2024-03-26 13:35:33 +08:00
Fangjun Kuang
ab7cff2513
Add C API for spoken language identification. ( #695 )
2024-03-25 15:16:47 +08:00
Fangjun Kuang
0d258dd150
Support spoken language identification with whisper ( #694 )
2024-03-24 22:57:00 +08:00
Fangjun Kuang
1952772654
Add timestamps and tokens for .Net's online models. ( #690 )
2024-03-23 18:51:56 +08:00
Karel Vesely
eaec4c83c2
Configurable low_freq high_freq, dithering ( #664 )
2024-03-22 21:41:44 +08:00
Fangjun Kuang
c8770aec20
Add nuget package for Windows x86 ( #683 )
2024-03-21 14:57:01 +08:00
Fangjun Kuang
acf0975153
Support whisper language/task in various language bindings. ( #679 )
2024-03-20 16:43:35 +08:00
Viggo
842d04d7ae
support whisper language ( #678 )
2024-03-20 10:16:22 +08:00
Bhaswati Saha
fda614d0d1
beam search value as parameter in offline_recognizer.py ( #673 )
...
Co-authored-by: bhascns <bhaswati@mihup.com >
2024-03-18 18:43:05 +08:00
Lovemefan
009ed2cd30
add WebAssembly for Kws ( #648 )
2024-03-11 21:02:31 +08:00
xinhecuican
f43139e803
c++ api for keyword spotter ( #642 )
2024-03-11 10:23:46 +08:00
Fangjun Kuang
3232dff2cf
Support user provided data in tts callback. ( #653 )
2024-03-09 18:15:03 +08:00
GaryLaurenceauAva
ac43c2d7b6
Expose 'language' 'task' 'tailPaddings' in OfflineWhisperModelConfig ( #643 )
...
Co-authored-by: Gary <gary.laurenceau@gmail.com >
2024-03-08 19:52:30 +08:00
Fangjun Kuang
d3287f9494
Add Python ASR examples with alsa ( #646 )
2024-03-08 11:34:48 +08:00
Wei Kang
e9e8d755d9
Fix detetion at the tail when using hotwords in streaming model ( #638 )
2024-03-08 10:04:33 +08:00
Fangjun Kuang
bdf9243940
Allow to not use pre-installed onnxruntime libs. ( #636 )
2024-03-06 14:40:23 +08:00
Fangjun Kuang
ed06ced16f
Add WebAssembly for NodeJS. ( #628 )
2024-03-03 20:00:36 +08:00
Fangjun Kuang
d56964371c
Support VITS models from icefall. ( #625 )
2024-03-01 19:48:38 +08:00
Fangjun Kuang
e2397cd1a4
Support Android NNAPI. ( #622 )
2024-03-01 16:39:48 +08:00
Wei Kang
734bbd91dc
Add Python API for keyword spotting ( #576 )
...
* Add alsa & microphone support for keyword spotting
* Add python wrapper
2024-03-01 09:31:11 +08:00
Karel Vesely
38c072dcb2
Track token scores ( #571 )
...
* add export of per-token scores (ys, lm, context)
- for best path of the modified-beam-search decoding of transducer
* refactoring JSON export of OnlineRecognitionResult, extending pybind11 API of OnlineRecognitionResult
* export per-token scores also for greedy-search (online-transducer)
- export un-scaled lm_probs (modified-beam search, online-transducer)
- polishing
* fill lm_probs/context_scores only if LM/ContextGraph is present (make Result smaller)
2024-02-29 06:28:45 +08:00
Fangjun Kuang
0cb6d1b474
support using xnnpack as execution provider ( #612 )
2024-02-28 17:32:48 +08:00
Fangjun Kuang
87a7030c08
Support using alsa to access the microphone with non-streaming ASR models ( #517 )
2024-02-26 21:17:26 +08:00
Fangjun Kuang
fb04366179
Fix #608 ( #610 )
...
Fix java tests.
2024-02-26 13:49:37 +08:00
Fangjun Kuang
67acd34dcd
Use alsa to read microphone in speaker identification demo. ( #605 )
2024-02-23 19:27:51 +08:00
Fangjun Kuang
16ba7e274a
Add WebAssembly for ASR ( #604 )
2024-02-23 17:39:11 +08:00
Fangjun Kuang
099a0ccae3
Link the math lib. ( #592 )
2024-02-21 15:36:54 +08:00
Askars
763a51486e
Add missing start_time to python API ( #591 )
...
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv >
2024-02-20 20:47:53 +08:00
Fangjun Kuang
3d2c7fad74
Increase the right chunk size of streaming paraformer to 3 ( #588 )
2024-02-20 09:44:40 +08:00
Fangjun Kuang
d771762868
Support WebAssembly for text-to-speech ( #577 )
2024-02-08 23:39:12 +08:00
ductranminh
665b869f03
Add context biasing for mobile ( #568 )
2024-02-01 21:33:22 +08:00
Fangjun Kuang
0b18ccfbb2
C++ API demo for speaker identification with portaudio. ( #561 )
2024-01-30 11:21:43 +08:00
Fangjun Kuang
fa2af5dc69
Add TTS demo for C# API ( #557 )
2024-01-28 23:29:39 +08:00
Fangjun Kuang
44efff4e47
Fix CI tests for Python and JNI. ( #554 )
2024-01-27 13:01:54 +08:00
Karel Vesely
3f2a17ef47
Fixes issue #535 , fix hexa 1-char tokens in ASR output. ( #550 )
...
- Avoid output like : `[' K', '<0x64>', '<0x79>', 'ť', ' a', '<0x75>',
'to', 'bu', '<0x73>', '<0x75>', ... ]` with regular 500 BPE units.
- Don't rewrite 1-char tokens in range [ 0x20 (space) .. 0x7E (tilde) ]
2024-01-26 19:23:20 +08:00
chiiyeh
e7b18a2139
add blank_penalty for online transducer ( #548 )
2024-01-26 12:12:13 +08:00
chiiyeh
466a6855c8
add hotwords docstring to offline_recognizer and online_recognizer ( #546 )
2024-01-25 16:54:20 +08:00
chiiyeh
3bb3849ec5
add blank_penalty for offline transducer ( #542 )
2024-01-25 15:00:09 +08:00
Fangjun Kuang
bbd7c7fc18
Add Android demo for speaker recognition ( #536 )
...
See pre-built Android APKs at
https://k2-fsa.github.io/sherpa/onnx/speaker-identification/apk.html
2024-01-23 16:50:52 +08:00
Wei Kang
b6c020901a
decoder for open vocabulary keyword spotting ( #505 )
...
* various fixes to ContextGraph to support open vocabulary keywords decoder
* Add keyword spotter runtime
* Add binary
* First version works
* Minor fixes
* update text2token
* default values
* Add jni for kws
* add kws android project
* Minor fixes
* Remove unused interface
* Minor fixes
* Add workflow
* handle extra info in texts
* Minor fixes
* Add more comments
* Fix ci
* fix cpp style
* Add input box in android demo so that users can specify their keywords
* Fix cpp style
* Fix comments
* Minor fixes
* Minor fixes
* minor fixes
* Minor fixes
* Minor fixes
* Add CI
* Fix code style
* cpplint
* Fix comments
* Fix error
2024-01-20 22:52:41 +08:00