Fangjun Kuang
b2f0249420
Add recording permission for iOS App. ( #900 )
2024-05-22 09:39:55 +08:00
Fangjun Kuang
4593ab49d1
Add Flutter example for speaker identification ( #894 )
2024-05-21 20:38:52 +08:00
Fangjun Kuang
b445956675
Fix CI tests. ( #898 )
2024-05-21 20:37:29 +08:00
Fangjun Kuang
fdcae56a14
Fix Go tests ( #897 )
2024-05-21 11:50:13 +08:00
Wei Kang
b012b78ceb
Encode hotwords in C++ side ( #828 )
...
* Encode hotwords in C++ side
2024-05-20 19:41:36 +08:00
Fangjun Kuang
8af2af8466
Add tail_paddings to Whisper C API. ( #886 )
2024-05-17 09:20:07 +08:00
Fangjun Kuang
65635b09d8
Fix a typo in jni ( #885 )
2024-05-16 14:31:45 +08:00
Fangjun Kuang
a421f8c1df
Fix Java API examples ( #883 )
2024-05-16 12:16:17 +08:00
linziguan
d2745698c5
Support building JNI on Windows ( #881 )
2024-05-16 06:25:53 +08:00
Fangjun Kuang
c2dcdabab1
Fix sherpa-onnx-node-version in node examples ( #879 )
2024-05-15 14:32:30 +08:00
Fangjun Kuang
03c956a317
Add keyword spotting API for node-addon-api ( #877 )
2024-05-14 20:26:48 +08:00
Fangjun Kuang
75630b986b
Support adding puncutations to text for node-addon-api ( #876 )
2024-05-14 19:28:56 +08:00
Fangjun Kuang
d19f50b799
Add audio tagging APIs for node-addon-api ( #875 )
2024-05-14 17:32:30 +08:00
Fangjun Kuang
388e6a98fc
Add speaker identification APIs for node-addon-api ( #874 )
2024-05-14 13:28:50 +08:00
Fangjun Kuang
0895b64850
Refactor node-addon-api to remove duplicate. ( #873 )
2024-05-14 10:08:11 +08:00
Fangjun Kuang
939fdd942c
Add spoken language identification for node-addon-api ( #872 )
2024-05-13 20:26:11 +08:00
Fangjun Kuang
031134b4d4
Add TTS for node-addon-api ( #871 )
2024-05-13 19:24:09 +08:00
Manix
740d7ae9d6
fixing bug and compiler error ( #870 )
...
Signed-off-by: manickavela1998@gmail.com <manickavela1998@gmail.com >
2024-05-13 17:44:03 +08:00
Fangjun Kuang
697b960768
Add non-streaming ASR APIs for node-addon-api ( #868 )
2024-05-13 16:03:34 +08:00
Fangjun Kuang
384f96c40f
Add streaming CTC ASR APIs for node-addon-api ( #867 )
2024-05-13 11:58:25 +08:00
Fangjun Kuang
db85b2c1d8
Add Android APKs for NeMo CTC models. ( #866 )
2024-05-12 14:58:36 +08:00
Fangjun Kuang
7322f4e0a3
Fix node addon tests ( #865 )
...
* Install naudiodon2 manually.
It is needed only when using a microphone. The CI tests don't need it.
2024-05-12 12:03:43 +08:00
Fangjun Kuang
eee5d8a15c
Add node-addon-api for VAD ( #864 )
2024-05-11 20:58:23 +08:00
Fangjun Kuang
677bc1da3e
Add Speaker ID demo for C# ( #862 )
2024-05-11 13:27:33 +08:00
Fangjun Kuang
a88b3bac21
Fix Python TTS examples for models using jieba. ( #861 )
2024-05-11 09:21:51 +08:00
Fangjun Kuang
65f5161456
Add more streaming ASR methods for node-addon-api ( #860 )
2024-05-10 18:21:05 +08:00
Fangjun Kuang
46e4e5b7ac
Add C++ support for streaming NeMo CTC models. ( #857 )
2024-05-10 16:26:43 +08:00
yh646492956
1eb60e8711
Solve the issue of missing the last sentence with punctuation ( #856 )
...
Co-authored-by: Hao You <13182720519@sina.cn >
2024-05-10 15:41:42 +08:00
Fangjun Kuang
17cd3a5f01
Add C++ runtime for non-streaming faster conformer transducer from NeMo. ( #854 )
2024-05-10 12:15:39 +08:00
Fangjun Kuang
5d8c35e44e
Add C++ support for non-streaming NeMo fast conformer hybrid transducer ctc (the ctc branch) ( #848 )
2024-05-09 15:32:22 +08:00
Fangjun Kuang
5ed3ec1c04
Export non-streaming NeMo faster conformer hybrid transducer and ctc to sherpa-onnx ( #847 )
2024-05-09 13:59:47 +08:00
Fangjun Kuang
68b25abf27
Export NeMo FastConformer Hybrid Transducer Large Streaming to ONNX ( #844 )
2024-05-08 19:07:49 +08:00
Fangjun Kuang
a9f936e92b
Export NeMo FastConformer Hybrid Transducer-CTC Large Streaming to ONNX. ( #843 )
2024-05-08 12:33:46 +08:00
Fangjun Kuang
dbaa26ff4b
Publish node-addon-api npm package for linux arm64 ( #841 )
2024-05-07 23:05:40 +08:00
Fangjun Kuang
d2e86b0415
Add links to pre-built APKs and pre-trained models to README. ( #840 )
2024-05-07 12:28:42 +08:00
Fangjun Kuang
37a4135dd7
Publish npm package with node-addon-api for Windows ( #838 )
2024-05-06 16:21:29 +08:00
Fangjun Kuang
e1bb928805
Upload two more 3d-speaker models ( #837 )
2024-05-06 12:23:49 +08:00
chiiyeh
9c8255fdb2
Update 3dspeaker/export-onnx.py ( #836 )
...
Update to match the changes in infer_sv.py at 3D-speaker.
Added 2 more supported models and "zh_en" language.
2024-05-06 12:10:35 +08:00
Fangjun Kuang
4f758e6cd3
Publish node-addon-api wrapper for sherpa-onnx as npm packages ( #829 )
2024-05-04 13:27:39 +08:00
Fangjun Kuang
2f9553d838
Begin to add node-addon-api for sherpa-onnx ( #826 )
2024-05-03 14:47:40 +08:00
Fangjun Kuang
fcd6024200
Fix typos in JNI TTS ( #824 )
2024-05-01 14:14:24 +08:00
Fangjun Kuang
cff207623e
Add Java API for speaker identification ( #822 )
2024-04-29 21:23:56 +08:00
Fangjun Kuang
88202f05bb
Add Java API for audio tagging ( #820 )
2024-04-28 22:26:04 +08:00
Fangjun Kuang
5407f880c0
Add Java and Kotlin API for punctuation models ( #818 )
2024-04-26 22:06:48 +08:00
Fangjun Kuang
db25986240
Add Java API for spoken language identification with whisper multilingual models ( #817 )
2024-04-26 19:05:39 +08:00
Fangjun Kuang
f2d074aea9
Fix a bug for offline paraformer ( #816 )
2024-04-26 16:40:42 +08:00
Fangjun Kuang
612002da57
Fix C# to support Chinese tts models using jieba ( #815 )
2024-04-26 11:50:07 +08:00
Fangjun Kuang
c693676d20
Fix building wheels for macOS ( #814 )
2024-04-26 10:05:39 +08:00
Karel Vesely
2e45d327a5
Adding temperature scaling on Joiner logits: ( #789 )
...
* Adding temperature scaling on Joiner logits:
- T hard-coded to 2.0
- so far best result NCE 0.122 (still not so high)
- the BPE scores were rescaled with 0.2 (but then also incorrect words
get high confidence, visually reasonable histograms are for 0.5 scale)
- BPE->WORD score merging done by min(.) function
(tried also prob-product, and also arithmetic, geometric, harmonic mean)
- without temperature scaling (i.e. scale 1.0), the best NCE was 0.032 (here product merging was best)
Results seem consistent with: https://arxiv.org/abs/2110.15222
Everything tuned on a very-small set of 100 sentences with 813 words and 10.2% WER, a Czech model.
I also experimented with blank posteriors mixed into the BPE confidences,
but no NCE improvement found, so not pushing that.
Temperature scling added also to the Greedy search confidences.
* making `temperature_scale` configurable from outside
2024-04-26 09:44:26 +08:00
Fangjun Kuang
15772d2150
Add Java API for text-to-speech ( #811 )
2024-04-26 09:26:39 +08:00