Speaker diarization example with onnxruntime Python API (#1395)

2024-10-06 16:37:29 +08:00
parent 5f50cbf65a
commit 70165cb42d
6 changed files with 719 additions and 1 deletions
--- a/scripts/pyannote/segmentation/README.md
+++ b/scripts/pyannote/segmentation/README.md
@@ -0,0 +1,44 @@
+# File description
+
+Please download test wave files from
+https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-segmentation-models
+
+## 0-two-speakers-zh.wav
+
+This file is from
+https://www.modelscope.cn/models/iic/speech_campplus_speaker-diarization_common/file/view/master?fileName=examples%252F2speakers_example.wav&status=0
+
+Note that we have renamed it from `2speakers_example.wav` to `0-two-speakers-zh.wav`.
+
+## 1-two-speakers-en.wav
+
+This file is from
+https://github.com/pengzhendong/pyannote-onnx/blob/master/data/test_16k.wav
+and it contains speeches from two speakers.
+
+Note that we have renamed it from `test_16k.wav` to `1-two-speakers-en.wav`
+
+
+## 2-two-speakers-en.wav
+This file is from
+https://huggingface.co/spaces/Xenova/whisper-speaker-diarization
+
+Note that the original file is `./fcf059e3-689f-47ec-a000-bdace87f0113.mp4`.
+We use the following commands to convert it to `2-two-speakers-en.wav`.
+
+```bash
+ffmpeg -i ./fcf059e3-689f-47ec-a000-bdace87f0113.mp4 -ac 1 -ar 16000 ./2-two-speakers-en.wav
+```
+
+## 3-two-speakers-en.wav
+
+This file is from
+https://aws.amazon.com/blogs/machine-learning/deploy-a-hugging-face-pyannote-speaker-diarization-model-on-amazon-sagemaker-as-an-asynchronous-endpoint/
+
+Note that the original file is `ML16091-Audio.mp3`. We use the following
+commands to convert it to `3-two-speakers-en.wav`
+
+
+```bash
+sox ML16091-Audio.mp3 3-two-speakers-en.wav
+```