Fangjun Kuang
eee5d8a15c
Add node-addon-api for VAD ( #864 )
2024-05-11 20:58:23 +08:00
Fangjun Kuang
677bc1da3e
Add Speaker ID demo for C# ( #862 )
2024-05-11 13:27:33 +08:00
Fangjun Kuang
a88b3bac21
Fix Python TTS examples for models using jieba. ( #861 )
2024-05-11 09:21:51 +08:00
Fangjun Kuang
65f5161456
Add more streaming ASR methods for node-addon-api ( #860 )
2024-05-10 18:21:05 +08:00
Fangjun Kuang
46e4e5b7ac
Add C++ support for streaming NeMo CTC models. ( #857 )
2024-05-10 16:26:43 +08:00
yh646492956
1eb60e8711
Solve the issue of missing the last sentence with punctuation ( #856 )
...
Co-authored-by: Hao You <13182720519@sina.cn >
2024-05-10 15:41:42 +08:00
Fangjun Kuang
17cd3a5f01
Add C++ runtime for non-streaming faster conformer transducer from NeMo. ( #854 )
2024-05-10 12:15:39 +08:00
Fangjun Kuang
5d8c35e44e
Add C++ support for non-streaming NeMo fast conformer hybrid transducer ctc (the ctc branch) ( #848 )
2024-05-09 15:32:22 +08:00
Fangjun Kuang
5ed3ec1c04
Export non-streaming NeMo faster conformer hybrid transducer and ctc to sherpa-onnx ( #847 )
2024-05-09 13:59:47 +08:00
Fangjun Kuang
68b25abf27
Export NeMo FastConformer Hybrid Transducer Large Streaming to ONNX ( #844 )
2024-05-08 19:07:49 +08:00
Fangjun Kuang
a9f936e92b
Export NeMo FastConformer Hybrid Transducer-CTC Large Streaming to ONNX. ( #843 )
2024-05-08 12:33:46 +08:00
Fangjun Kuang
dbaa26ff4b
Publish node-addon-api npm package for linux arm64 ( #841 )
2024-05-07 23:05:40 +08:00
Fangjun Kuang
d2e86b0415
Add links to pre-built APKs and pre-trained models to README. ( #840 )
2024-05-07 12:28:42 +08:00
Fangjun Kuang
37a4135dd7
Publish npm package with node-addon-api for Windows ( #838 )
2024-05-06 16:21:29 +08:00
Fangjun Kuang
e1bb928805
Upload two more 3d-speaker models ( #837 )
2024-05-06 12:23:49 +08:00
chiiyeh
9c8255fdb2
Update 3dspeaker/export-onnx.py ( #836 )
...
Update to match the changes in infer_sv.py at 3D-speaker.
Added 2 more supported models and "zh_en" language.
2024-05-06 12:10:35 +08:00
Fangjun Kuang
4f758e6cd3
Publish node-addon-api wrapper for sherpa-onnx as npm packages ( #829 )
2024-05-04 13:27:39 +08:00
Fangjun Kuang
2f9553d838
Begin to add node-addon-api for sherpa-onnx ( #826 )
2024-05-03 14:47:40 +08:00
Fangjun Kuang
fcd6024200
Fix typos in JNI TTS ( #824 )
2024-05-01 14:14:24 +08:00
Fangjun Kuang
cff207623e
Add Java API for speaker identification ( #822 )
2024-04-29 21:23:56 +08:00
Fangjun Kuang
88202f05bb
Add Java API for audio tagging ( #820 )
2024-04-28 22:26:04 +08:00
Fangjun Kuang
5407f880c0
Add Java and Kotlin API for punctuation models ( #818 )
2024-04-26 22:06:48 +08:00
Fangjun Kuang
db25986240
Add Java API for spoken language identification with whisper multilingual models ( #817 )
2024-04-26 19:05:39 +08:00
Fangjun Kuang
f2d074aea9
Fix a bug for offline paraformer ( #816 )
2024-04-26 16:40:42 +08:00
Fangjun Kuang
612002da57
Fix C# to support Chinese tts models using jieba ( #815 )
2024-04-26 11:50:07 +08:00
Fangjun Kuang
c693676d20
Fix building wheels for macOS ( #814 )
2024-04-26 10:05:39 +08:00
Karel Vesely
2e45d327a5
Adding temperature scaling on Joiner logits: ( #789 )
...
* Adding temperature scaling on Joiner logits:
- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
- the BPE scores were rescaled with 0.2 (but then also incorrect words
get high confidence, visually reasonable histograms are for 0.5 scale)
- BPE->WORD score merging done by min(.) function
(tried also prob-product, and also arithmetic, geometric, harmonic mean)
- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)
Results seem consistent with: https://arxiv.org/abs/2110.15222
Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.
I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.
Temperature scling added also to the Greedy search confidences.
* making `temperature_scale` configurable from outside
2024-04-26 09:44:26 +08:00
Fangjun Kuang
15772d2150
Add Java API for text-to-speech ( #811 )
2024-04-26 09:26:39 +08:00
Daniel Doña
fa2429920f
Add function 'tolowerUnicode' in sherpa-onnx-microphone ( fix #791 ) ( #812 )
2024-04-26 09:19:32 +08:00
Fangjun Kuang
f7b3735621
Add CTC HLG decoding for JNI ( #810 )
2024-04-25 17:20:02 +08:00
Fangjun Kuang
6686c7d3e6
Add dict_dir arg to c api to support Chinese TTS models using jieba ( #809 )
2024-04-25 12:28:31 +08:00
Fangjun Kuang
83cd533f67
Add Java API for non-streaming ASR ( #807 )
2024-04-24 21:03:26 +08:00
Fangjun Kuang
c3a2e8a67c
Refactor Java API ( #806 )
2024-04-24 18:41:48 +08:00
Fangjun Kuang
c7691650d7
Fix CI tests ( #804 )
2024-04-24 13:01:06 +08:00
Fangjun Kuang
9b67a476e6
Refactor the JNI interface to make it more modular and maintainable ( #802 )
2024-04-24 09:48:42 +08:00
布宝
dc5af04830
wget 续传 ( #801 )
2024-04-22 20:19:08 +08:00
Fangjun Kuang
7f3b9ffe5d
Refactor TTS Android code to support jieba for Chinese TTS models ( #800 )
2024-04-22 17:21:05 +08:00
Fangjun Kuang
494cb5c733
Fix the last character not being recognized for streaming paraformer models. ( #799 )
2024-04-22 15:10:39 +08:00
Fangjun Kuang
9a68b92ce6
Increase CED's max frame length to 3000 ( #798 )
...
so that it can process waves for up to 30 seconds.
2024-04-22 10:18:47 +08:00
Fangjun Kuang
6b353bfb42
Add jieba for Chinese TTS models ( #797 )
2024-04-21 14:47:13 +08:00
Fangjun Kuang
2e0ee0e8c8
fix a typo in building language ID apk ( #795 )
2024-04-19 20:16:48 +08:00
Fangjun Kuang
37831fe89c
Release v1.9.22 ( #794 )
2024-04-19 18:37:47 +08:00
Fangjun Kuang
54bc504065
Add Python API example for CED audio tagging. ( #793 )
2024-04-19 18:33:18 +08:00
Fangjun Kuang
c1608b3524
Support CED models ( #792 )
2024-04-19 15:20:37 +08:00
Fangjun Kuang
d97a283dbb
Add Android demo for spoken language identification using Whisper multilingual models ( #783 )
2024-04-18 14:33:59 +08:00
Fangjun Kuang
3a43049ba1
Add JNI support for spoken language identification ( #782 )
2024-04-17 19:27:15 +08:00
Fangjun Kuang
69440e481f
Add WearOS demo for audio tagging ( #777 )
2024-04-17 12:22:17 +08:00
Fangjun Kuang
bcd9e48150
Add Android demo for audio tagging ( #776 )
...
See https://k2-fsa.github.io/sherpa/onnx/audio-tagging/apk.html
2024-04-16 20:47:16 +08:00
chiiyeh
aa2d695fd2
Add score function to speaker identification ( #775 )
2024-04-16 17:29:46 +08:00
Fangjun Kuang
6bf2099781
Fix code style issues ( #774 )
2024-04-16 09:46:15 +08:00