update README

2025-09-10 11:06:39 +08:00
parent 1df95ad2f6
commit 598213f466
11 changed files with 135 additions and 53 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,125 @@
 # 天数智芯 智铠100 语音合成
 该模型测试框架在智铠100加速卡上，适配了 Kokoro, F5-TTS, GPT-SoVITS 等模型，将语音信号转换为文本。
 GPT-SoVITS 模型是一个集成了语音转换和文本转语音功能的先进 AI 系统，基于 GPT 和 SoVITS 技术构建。
 Kokoro 是由 hexgrad 团队开发并开源的轻量级、高性能文本转语音（TTS）模型。
 F5-TTS 模型由上海交通大学团队发布，是基于扩散 Transformer 和 ConvNeXt V2 的文本转语音（TTS）模型。
 <!-- ## GPT-SoVITS 模型测试服务原理
 使用 GPT-SoVITS 框架内置的类 TTS 和 TTS_Config。
 其中 TTS_Config 封装了 GPT-SoVITS 模型运行所需的全部配置参数；TTS 类是 GPT-SoVITS 模型的高层封装，整合了模型加载、文本处理、语音生成等全流程逻辑。
 初始化时会加载 GPT-SoVITS 的 模型路径（如 bert_base_path、vits_weights_path）、运行设备（device: "cuda"）、精度（is_half）等参数，传入 TTS_Config 配置实例进行初始化，最终传给 TTS 类初始化管道，通过 TTS 类直接调用模型进行语音合成，将生成的音频片段打包后返回给客户端
 此外，服务中重写了 PyTorch 中 Conv1d 和 ConvTranspose1d 的前向传播方法，可能用于优化模型在特定设备 CUDA 上的运行效率
 ## F5-TTS 模型测试服务原理
 通过 infer_batch_process 函数（from f5_tts.infer.utils_infer）调用 F5-TTS 模型，infer_batch_process封装了模型推理的核心细节。核心逻辑为：
 ```python
 for gen_audio, gen_sr in infer_batch_process(
    (audio, sr),  # 参考音频数据和采样率
    ref_text,     # 参考文本
    gen_text_batches,  # 分块后的目标文本
    ema_model,    # 加载好的F5-TTS模型
    vocoder,      # 声码器
    device=device,
    streaming=True,  # 流式输出（边生成边返回）
    chunk_size=int(24e6),
 ):
    yield gen_audio.tobytes()  # 以字节流返回生成的音频
 ```
 生成的音频数据以 WAV 格式的字节流通过 StreamingResponse 返回给客户端，实现实时语音输出。
 ## GPT-SoVITS 和 F5-TTS 模型测试服务请求示例
 ```python
 import requests
 # 服务地址（根据实际部署情况修改IP和端口,端口为80）
 url = f"{sut_url}/generate"
 #url = "http://localhost:80/generate"
 # 构造请求数据（multipart/form-data）
 files = {
    "ref_audio": open("/path/to/reference.wav", "rb")  # 参考音频文件
 }
 data = {
    "ref_text": "这是参考音频对应的文本",  # 参考文本
    "text": "这是需要合成语音的目标文本",    # 目标文本
    "lang": "zh"                          # 语言
 }
 # 发送请求并保存结果
 response = requests.post(url, files=files, data=data, stream=True)
 if response.status_code == 200:
    with open("output.wav", "wb") as f:
        for chunk in response.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)
    print("合成成功，音频已保存为 output.wav")
 else:
    print(f"请求失败，状态码：{response.status_code}")
 #服务会根据参考音频（ref_audio）和参考文本（ref_text）提取语音风格，然后将目标文本（text）合成为具有相同风格的语音。
 #响应为流式音频，客户端需要按流的方式接收并保存（如示例中使用 stream=True 和迭代读取 chunk）。
 #若需检查服务是否可用，可发送 GET 请求到 http://localhost:80/health 或 http://localhost:80/ready，正常会返回 {"status": "ok"}。
 ```
 ## Kokoro 模型测试服务原理
 使用 kokoro 模块内置的 KModel 和 KPipeline 类。
 KModel 是封装 kokoro 模型的类，负责加载模型结构和权重。
 KPipeline 是连接文本输入和模型推理的核心管道类，主要职责包括：处理特定语言的文本（如英文音标转换、中文拼音处理等）和 调用 KModel（kokoro 模型）进行语音生成，关联模型与语言处理规则（如英文音标转换、语速调整等）。
 向 KPipeline 传入语言标识（lang_code）、模型实例（model）等参数，用于区分不同语言的处理逻辑；kokoro 模型通过 KModel 类加载，由 KPipeline 管道封装调用逻辑，最终调用 kokoro 模型根据输入文本和语言类型生成对应语言的语音,以流式音频形式返回结果
 ## Kokoro 模型测试服务请求示例
 ```python
 import requests
 # 服务器地址（根据实际部署修改）
 url = "http://localhost:80/tts"
 # SSML 输入（支持中文/英文，通过 xml:lang 指定语言）
 ssml = """
 <speak>
  <voice xml:lang="zh">你好，这是 Kokoro 模型生成的中文语音。</voice>
 </speak>
 """
 # 英文示例：<voice xml:lang="en">Hello, this is English speech generated by Kokoro.</voice>
 # 发送 POST 请求
 response = requests.post(
    url,
    data=ssml,
    headers={"Content-Type": "text/plain"},  # 直接发送纯文本 SSML
    stream=True  # 启用流式响应
 )
 # 保存音频到文件（PCM 格式，可通过播放器直接播放或转成 WAV）
 if response.status_code == 200:
    with open("output.pcm", "wb") as f:
        for chunk in response.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)
    print("音频保存成功：output.pcm")
 else:
    print(f"请求失败：{response.status_code}，{response.text}")
 ``` -->
 ## 如何使用语音合成模型测试框架
 代码实现了一个接收音频数据并返回识别文本的语音识别 HTTP 服务，将该 HTTP 服务重新打包成 docker 镜像，通过 k8s 集群sut容器去请求这个 HTTP 服务。
 ## 智铠100系列上语音合成模型运行测试结果
 在智铠100系列上对部分语音合成模型进行适配，测试方式为在 Nvidia A100 和 智铠100加速卡上对同一段text进行语音合成任务，获取运行时间
 | 模型名称   | 模型类型               | 适配状态 | 智铠100运行时间/s | Nvidia A100运行时间/s |
 | ---------- | ---------------------- | -------- | ----------------- | --------------------- |
 | kokoro     | StyleTTS 2, ISTFTNet   | 成功     | 2.1               | 5.4                   |
 | f5-TTS     | DiT, ConvNeXt V2       | 成功     | 6.5               | 5.4                   |
 | gpt-sovits | VITS                   | 成功     | 17.7              | 20.5                   |
 | matcha     | OT-CFM, Transformer    | 成功     | 3.7               | 3.2                   |
 | piper      | -                      | 成功     | 0.3               | 1.7                   |
--- a/mr_v100-f5-tts/Dockerfile_f5
+++ b/mr_v100-f5-tts/Dockerfile_f5
@@ -2,10 +2,8 @@ FROM git.modelhub.org.cn:9443/enginex-iluvatar/mr100_corex:4.3.0
 WORKDIR /workspace
 COPY . /workspace/
-RUN pip install -r requirements_f5.txt -c constraints_f5.txt -i https://nexus.4pd.io/repository/pypi-all/simple 
+RUN pip install -r requirements_f5.txt -c constraints_f5.txt 
-RUN cd F5-TTS && pip install -e . -c ../constraints_f5.txt -i https://nexus.4pd.io/repository/pypi-all/simple 
+RUN cd F5-TTS && pip install -e . -c ../constraints_f5.txt 
-
+ENTRYPOINT ["/bin/bash", "launch_f5.sh"]
 #ENTRYPOINT ["/bin/bash", "launch_f5.sh"]
 ENTRYPOINT ["/bin/bash", "launch.sh"]
--- a/mr_v100-f5-tts/launch.sh
+++ b/mr_v100-f5-tts/launch.sh
@@ -1,3 +0,0 @@
 #!/bin/bash
 python3 f5_server.py
--- a/mr_v100-gpt-sovits/Dockerfile_gsv
+++ b/mr_v100-gpt-sovits/Dockerfile_gsv
@@ -13,7 +13,5 @@ COPY constraints_gsv.txt /workspace/
 RUN pip install -r GPT-SoVITS/extra-req.txt --no-deps \
    && pip install -r GPT-SoVITS/requirements.txt -c constraints_gsv.txt 
-#COPY launch_gsv.sh /workspace/
+COPY launch_gsv.sh /workspace/
-#ENTRYPOINT ["/bin/bash", "launch_gsv.sh"]
+ENTRYPOINT ["/bin/bash", "launch_gsv.sh"]
 COPY launch.sh /workspace/
 ENTRYPOINT ["/bin/bash", "launch.sh"]
--- a/mr_v100-gpt-sovits/launch.sh
+++ b/mr_v100-gpt-sovits/launch.sh
@@ -1,17 +0,0 @@
 #!/bin/bash
 redis-server --daemonize yes
 if [ -z "$MODEL_DIR" ]; then
  export MODEL_DIR="/models/GPT-SoVITS"
 fi
 if [ -z "$NLTK_DATA" ]; then
  export NLTK_DATA="/models/GPT-SoVITS/nltk_data"
 fi
 if [ -z "$bert_path" ]; then
  export bert_path="${MODEL_DIR}/chinese-roberta-wwm-ext-large"
 fi
 cd GPT-SoVITS && python3 gsv_server.py
--- a/mr_v100-kokoro/Dockerfile_kokoro
+++ b/mr_v100-kokoro/Dockerfile_kokoro
@@ -7,11 +7,9 @@ RUN apt-get update && \
    rm -rf /var/lib/apt/lists/*
 COPY requirements_kokoro.txt constraints_kokoro.txt kokoro_server.py en_core_web_sm-3.8.0.tar.gz /workspace/
-RUN pip install -r requirements_kokoro.txt -c constraints_kokoro.txt -i https://nexus.4pd.io/repository/pypi-all/simple 
+RUN pip install -r requirements_kokoro.txt -c constraints_kokoro.txt 
 RUN pip install en_core_web_sm-3.8.0.tar.gz
-#COPY launch_kokoro.sh /workspace/
+COPY launch_kokoro.sh /workspace/
-#ENTRYPOINT ["/bin/bash", "launch_kokoro.sh"]
+ENTRYPOINT ["/bin/bash", "launch_kokoro.sh"]
 COPY launch.sh /workspace/
 ENTRYPOINT ["/bin/bash", "launch.sh"]
--- a/mr_v100-kokoro/launch.sh
+++ b/mr_v100-kokoro/launch.sh
@@ -1,4 +0,0 @@
 #!/bin/bash
 python3 kokoro_server.py
--- a/mr_v100-matcha/Dockerfile_matcha
+++ b/mr_v100-matcha/Dockerfile_matcha
@@ -11,10 +11,7 @@ COPY requirements_matcha.txt constraints_matcha.txt matcha_server.py launch_matc
 RUN pip install -r requirements_matcha.txt -c constraints_matcha.txt 
 RUN pip install matcha-tts -c constraints_matcha.txt
-#ENTRYPOINT ["/bin/bash", "launch_matcha.sh"]
+ENTRYPOINT ["/bin/bash", "launch_matcha.sh"]
 COPY launch.sh /workspace/
 ENTRYPOINT ["/bin/bash", "launch.sh"]
--- a/mr_v100-matcha/launch.sh
+++ b/mr_v100-matcha/launch.sh
@@ -1,4 +0,0 @@
 #!/bin/bash
 python3 matcha_server.py
--- a/mr_v100-piper/Dockerfile_piper
+++ b/mr_v100-piper/Dockerfile_piper
@@ -9,7 +9,4 @@ RUN pip install -r requirements_piper.txt -c constraints_piper.txt
 ENV PYTHONPATH=/workspace/piper/src/python:$PYTHONPATH
 RUN cd /workspace/piper/src/python && pip install -e . && ./build_monotonic_align.sh
-#ENTRYPOINT ["/bin/bash", "launch_piper.sh"]
+ENTRYPOINT ["/bin/bash", "launch_piper.sh"]
 COPY launch.sh /workspace/
 ENTRYPOINT ["/bin/bash", "launch.sh"]
--- a/mr_v100-piper/launch.sh
+++ b/mr_v100-piper/launch.sh
@@ -1,3 +0,0 @@
 #!/bin/bash
 python3 piper_server.py