Fix paraformer Englishword split

2026-02-09 13:48:45 +08:00
parent 15b838d17d
commit 718a5bd24d
2 changed files with 33 additions and 16 deletions
--- a/README.md
+++ b/README.md
@@ -7,6 +7,22 @@ docker build -f ./Dockerfile.funasr-mr100 -t <your_image> .
 其中，基础镜像 corex:4.3.0 通过联系天数智芯智铠100厂商技术支持可获取

 ## 使用说明
+
+### 使用 FastAPI 测试ASR服务：
+例如：
+```shell
+docker run -it --rm --name iluvatar_test_asr -p 23333:1111 \
+    --privileged \
+    -v /lib/modules:/lib/modules \
+    -v /dev:/dev \
+    -v /usr/src:/usr/src \
+    -v /mnt/gpfs/leaderboard/modelHubXC/iic/SenseVoiceSmall:/model \
+    -e CUDA_VISIBLE_DEVICES=0 \
+    --entrypoint python3 <IMAGE_NAME> main.py \
+    --port 1111 --model_dir /model --model_type sensevoice
+```
+
+
 ### 快速镜像测试
 对funasr的测试需要在以上构造好的镜像容器内测试，测试步骤
 1. 本项目中附带上了示例测试数据，音频文件为`lei-jun-test.wav`，音频的识别准确内容文件为`lei-jun.txt`，用户需要准备好相应的ASR模型路径，本例中假设我们已经下载好了SenseVoiceSmall模型存放于/model/SenseVoiceSmall
--- a/fastapi_funasr.py
+++ b/fastapi_funasr.py
@@ -201,7 +201,8 @@ def test_funasr(audio_file, lang):
                )
                text = res[0]["text"]
                # paraformer模型会一个字一个字输出，中间夹太多空格会影响1-cer的结果
-                text = text.replace(" ", "")
+                if lang == "zh":
+                    text = text.replace(" ", "")
            elif model_type == "conformer":
                res = model.generate(
                    input=segment_path,