Fangjun Kuang
4e040c596e
Support including TTS conditionally. ( #699 )
2024-03-26 17:21:35 +08:00
Fangjun Kuang
0d258dd150
Support spoken language identification with whisper ( #694 )
2024-03-24 22:57:00 +08:00
Karel Vesely
eaec4c83c2
Configurable low_freq high_freq, dithering ( #664 )
2024-03-22 21:41:44 +08:00
Bhaswati Saha
fda614d0d1
beam search value as parameter in offline_recognizer.py ( #673 )
...
Co-authored-by: bhascns <bhaswati@mihup.com >
2024-03-18 18:43:05 +08:00
Fangjun Kuang
3232dff2cf
Support user provided data in tts callback. ( #653 )
2024-03-09 18:15:03 +08:00
Fangjun Kuang
d3287f9494
Add Python ASR examples with alsa ( #646 )
2024-03-08 11:34:48 +08:00
Wei Kang
734bbd91dc
Add Python API for keyword spotting ( #576 )
...
* Add alsa & microphone support for keyword spotting
* Add python wrapper
2024-03-01 09:31:11 +08:00
Karel Vesely
38c072dcb2
Track token scores ( #571 )
...
* add export of per-token scores (ys, lm, context)
- for best path of the modified-beam-search decoding of transducer
* refactoring JSON export of OnlineRecognitionResult, extending pybind11 API of OnlineRecognitionResult
* export per-token scores also for greedy-search (online-transducer)
- export un-scaled lm_probs (modified-beam search, online-transducer)
- polishing
* fill lm_probs/context_scores only if LM/ContextGraph is present (make Result smaller)
2024-02-29 06:28:45 +08:00
Askars
763a51486e
Add missing start_time to python API ( #591 )
...
Co-authored-by: vsd-vector <askars.salimbajevs@tilde.lv >
2024-02-20 20:47:53 +08:00
Fangjun Kuang
44efff4e47
Fix CI tests for Python and JNI. ( #554 )
2024-01-27 13:01:54 +08:00
chiiyeh
e7b18a2139
add blank_penalty for online transducer ( #548 )
2024-01-26 12:12:13 +08:00
chiiyeh
466a6855c8
add hotwords docstring to offline_recognizer and online_recognizer ( #546 )
2024-01-25 16:54:20 +08:00
chiiyeh
3bb3849ec5
add blank_penalty for offline transducer ( #542 )
2024-01-25 15:00:09 +08:00
Fangjun Kuang
bbd7c7fc18
Add Android demo for speaker recognition ( #536 )
...
See pre-built Android APKs at
https://k2-fsa.github.io/sherpa/onnx/speaker-identification/apk.html
2024-01-23 16:50:52 +08:00
Wei Kang
b6c020901a
decoder for open vocabulary keyword spotting ( #505 )
...
* various fixes to ContextGraph to support open vocabulary keywords decoder
* Add keyword spotter runtime
* Add binary
* First version works
* Minor fixes
* update text2token
* default values
* Add jni for kws
* add kws android project
* Minor fixes
* Remove unused interface
* Minor fixes
* Add workflow
* handle extra info in texts
* Minor fixes
* Add more comments
* Fix ci
* fix cpp style
* Add input box in android demo so that users can specify their keywords
* Fix cpp style
* Fix comments
* Minor fixes
* Minor fixes
* minor fixes
* Minor fixes
* Minor fixes
* Add CI
* Fix code style
* cpplint
* Fix comments
* Fix error
2024-01-20 22:52:41 +08:00
Fangjun Kuang
2024e96639
Add C++ runtime for speaker verification models from NeMo ( #527 )
2024-01-13 21:42:09 +08:00
Fangjun Kuang
afc81ec122
Add C++ runtime for models from 3d-speaker ( #523 )
2024-01-11 19:10:30 +08:00
Fangjun Kuang
55266918c8
Add runtime support for wespeaker models ( #516 )
2024-01-09 22:06:08 +08:00
Fangjun Kuang
e215d0c39a
Fix Byte BPE string results for Python. ( #512 )
...
It ignores invalid UTF8 strings.
2024-01-03 16:03:24 +08:00
Fangjun Kuang
d7e10bb3f8
Replace Android system TTS engine ( #508 )
2023-12-31 23:02:35 +08:00
Fangjun Kuang
e475e750ac
Support streaming zipformer CTC ( #496 )
...
* Support streaming zipformer CTC
* test online zipformer2 CTC
* Update doc of sherpa-onnx.cc
* Add Python APIs for streaming zipformer2 ctc
* Add Python API examples for streaming zipformer2 ctc
* Swift API for streaming zipformer2 CTC
* NodeJS API for streaming zipformer2 CTC
* Kotlin API for streaming zipformer2 CTC
* Golang API for streaming zipformer2 CTC
* C# API for streaming zipformer2 CTC
* Release v1.9.6
2023-12-22 13:46:33 +08:00
Fangjun Kuang
7634f5f034
Release Python GIL in C++ class constructor ( #493 )
2023-12-20 15:54:32 +08:00
Fangjun Kuang
0e23f82691
Give an informative log for whisper on exceptions. ( #473 )
2023-12-08 14:33:59 +08:00
Fangjun Kuang
3ae984f148
Remove the 30-second constraint from whisper. ( #471 )
2023-12-07 17:47:08 +08:00
Fangjun Kuang
99ff6a834c
Play generated audio as it is generating. ( #457 )
2023-12-02 15:35:11 +08:00
Fangjun Kuang
62dc3c3e46
Use piper-phonemize to convert text to token IDs ( #453 )
2023-11-30 23:57:43 +08:00
Fangjun Kuang
87a47d7db4
Release GIL to support multithreading in websocket servers. ( #451 )
2023-11-27 13:44:03 +08:00
Fangjun Kuang
049fb9f451
Add Python APIs for WeNet CTC models ( #428 )
2023-11-16 14:20:41 +08:00
Fangjun Kuang
fac4f6bc7c
Support streaming conformer CTC models from wenet ( #427 )
2023-11-16 10:35:23 +08:00
Fangjun Kuang
b83b3e3cd1
Support non-streaming WeNet CTC models. ( #426 )
2023-11-15 14:23:20 +08:00
Fangjun Kuang
d1a450bf82
Support text normalization via rule FST ( #407 )
2023-11-05 08:59:03 +08:00
Fangjun Kuang
1937717705
Add MFC TTS example on Windows ( #378 )
2023-10-21 00:13:07 +08:00
Fangjun Kuang
9efe69720d
Support VITS VCTK models ( #367 )
...
* Support VITS VCTK models
* Release v1.8.1
2023-10-16 17:22:30 +08:00
Fangjun Kuang
655e0fa836
add python API and examples for TTS ( #364 )
2023-10-14 14:21:53 +08:00
Peng He
4771c9275c
Add lm decode for the Python API. ( #353 )
...
* Add lm decode for the Python API.
* fix style.
* Fix LogAdd,
Shouldn't double lm_log_prob when merge same prefix path
* sort the import alphabetically
2023-10-13 11:15:16 +08:00
Fangjun Kuang
407602445d
Add CTC HLG decoding using OpenFst ( #349 )
2023-10-08 11:32:39 +08:00
Fangjun Kuang
33a5765169
Print a more user-friendly error message when using --hotwords-file. ( #344 )
2023-09-26 11:04:20 +08:00
Fangjun Kuang
c471423125
Add Silero VAD ( #313 )
2023-09-17 14:54:38 +08:00
Wei Kang
47184f9db7
Refactor hotwords,support loading hotwords from file ( #296 )
2023-09-14 19:33:17 +08:00
Fangjun Kuang
f709c95c5f
Support multilingual whisper models ( #274 )
2023-08-16 00:28:52 +08:00
Fangjun Kuang
6038e2aa62
Support streaming paraformer ( #263 )
2023-08-14 10:32:14 +08:00
Fangjun Kuang
a4bff28e21
Support TDNN models from the yesno recipe from icefall ( #262 )
2023-08-12 19:50:22 +08:00
Fangjun Kuang
b094868fb8
Add non-streaming websocket server for python ( #259 )
2023-08-11 15:56:24 +08:00
Fangjun Kuang
79c2ce5dd4
Refactor online recognizer ( #250 )
...
* Refactor online recognizer.
Make it easier to support other streaming models.
Note that it is a breaking change for the Python API.
`sherpa_onnx.OnlineRecognizer()` used before should be
replaced by `sherpa_onnx.OnlineRecognizer.from_transducer()`.
2023-08-09 20:27:31 +08:00
Fangjun Kuang
45b9d4ab37
Support whisper models ( #238 )
2023-08-07 12:34:18 +08:00
Wilson Wongso
5a6b55c5a7
Reduce model initialization time for online speech recognition ( #215 )
...
* Reduce model initialization time for online speech recognition
* Fixed Styling
---------
Co-authored-by: w11wo <wilsowong961@gmail.com >
2023-07-14 21:20:10 +08:00
Fangjun Kuang
f3206c49dc
Reduce model initialization time for offline speech recognition ( #213 )
2023-07-14 18:07:27 +08:00
Fangjun Kuang
bebc1f1398
Use static libraries for MFC examples ( #210 )
2023-07-13 14:52:43 +08:00
Fangjun Kuang
5cd72ba3aa
Fix setting context lists. ( #207 )
2023-07-12 09:18:56 +08:00
Wilson Wongso
b2364b0374
Implemented tokens and timestamps in Python API ( #205 )
2023-07-12 09:12:31 +08:00