init ascend tts
This commit is contained in:
28
ascend_910-piper/piper/notebooks/lng/0.txt
Normal file
28
ascend_910-piper/piper/notebooks/lng/0.txt
Normal file
@@ -0,0 +1,28 @@
|
||||
[language info]
|
||||
Name=
|
||||
code=
|
||||
version=
|
||||
author=
|
||||
Copyright=
|
||||
[Strings]
|
||||
Interface openned. Write your texts, configure the different synthesis options or download all the voices you want. Enjoy!=
|
||||
Model failed to download!=
|
||||
No downloaded voice packages!=
|
||||
You have not loaded any model from the list!=
|
||||
Select voice package=
|
||||
Load it!=
|
||||
Select speaker=
|
||||
Rate scale=
|
||||
Phoneme noise scale=
|
||||
Phoneme stressing scale=
|
||||
Enter your text here=
|
||||
Text to synthesize=
|
||||
Synthesize=
|
||||
Auto-play=
|
||||
Click here to synthesize the text.=
|
||||
Exit=
|
||||
Closes this GUI.=
|
||||
audio history=
|
||||
The Use GPU checkbox is checked, but you don't have a GPU runtime.=
|
||||
The Use GPU checkbox is unchecked, however you are using a GPU runtime environment. We recommend you check the checkbox to use GPU to take advantage of it.=
|
||||
Invalid link or ID!=
|
||||
28
ascend_910-piper/piper/notebooks/lng/es.lang
Normal file
28
ascend_910-piper/piper/notebooks/lng/es.lang
Normal file
@@ -0,0 +1,28 @@
|
||||
[language info]
|
||||
Name=Español
|
||||
code=Es
|
||||
version=0.1
|
||||
author=Mateo Cedillo
|
||||
Copyright=© 2018-2023 MT Programs, todos los derechos Reservados
|
||||
[Strings]
|
||||
Interface openned. Write your texts, configure the different synthesis options or download all the voices you want. Enjoy!=Interfaz abierta. Escribe tus textos, configura las diferentes opciones de síntesis o descarga todas las voces que quieras. ¡Disfruta!
|
||||
Model failed to download!=¡No se pudo descargar el modelo!
|
||||
No downloaded voice packages!=¡No se han descargado paquetes de voz!
|
||||
You have not loaded any model from the list!=¡No has cargado ningún modelo de la lista!
|
||||
Select voice package=Selecciona paquete de voz
|
||||
Load it!=¡Cárgalo!
|
||||
Select speaker=Selecciona hablante
|
||||
Rate scale=Escala de velocidad
|
||||
Phoneme noise scale=Escala de resonancia de fonemas
|
||||
Phoneme stressing scale=Escala de acentuación de fonemas
|
||||
Enter your text here=Introduce tu texto aquí
|
||||
Text to synthesize=Texto a sintetizar
|
||||
Synthesize=Sintetizar
|
||||
Auto-play=Auto-reproducir
|
||||
Click here to synthesize the text.=Haz clic aquí para sintetizar el texto.
|
||||
Exit=Salir
|
||||
Closes this GUI.=Cierra esta interfaz.
|
||||
audio history=Historial de audio
|
||||
The Use GPU checkbox is checked, but you don't have a GPU runtime.=La casilla de usar GPU está habilitada, pero no tienes un entorno de ejecución con GPU.
|
||||
The Use GPU checkbox is unchecked, however you are using a GPU runtime environment. We recommend you check the checkbox to use GPU to take advantage of it.=La casilla de usar GPU está desmarcada, sin embargo, estás usando un entorno de ejecución GPU. Te recomendamos activar la casilla de usar GPU para aprovecharla.
|
||||
Invalid link or ID!=¡Enlace o ID no válido!
|
||||
21
ascend_910-piper/piper/notebooks/lng/guía de traducción.txt
Normal file
21
ascend_910-piper/piper/notebooks/lng/guía de traducción.txt
Normal file
@@ -0,0 +1,21 @@
|
||||
Instrucciones para traductores
|
||||
Este documento es una pequeña guía de instrucciones que ayudarán mejor a la creación de idiomas y entender su sintaxis.
|
||||
*Crear un nuevo idioma:
|
||||
Para crear un nuevo idioma, primero debes hacer una copia del archivo 0.txt, ya que ese archivo es una plantilla vacía de traducción y en esa plantilla nos basaremos para crear las entradas y los mensajes.
|
||||
Una vez hecha la copia del archivo 0.txt, nos posicionamos en la misma y renombramos el archivo a las dos primeras letras de tu idioma. Por ejemplo, si queremos hacer una traducción a francés entonces tendríamos que poner "fr" de "français", y hay que tener en cuenta que si queremos que nuestra traducción funcione en el programa deberemos cambiar la extensión te .txt a .lang. Antes funcionava solo con cambiar el nombre pero no la extensión, pero finalmente se ha decidido así por cambios técnicos y para la autodetección de idiomas nuevos.
|
||||
Una vez tengamos nuestra propia plantilla, debemos abrirla con algún editor de textos, y podemos comenzar a hacer nuestras entradas de traducción.
|
||||
Como podemos fijarnos, al principio hay una línea encerrada entre corchetes. Esas líneas se denominan secciones, y la primera es languaje info, sección de información sobre el idioma que estamos creando.
|
||||
Explicaré las siguientes líneas que debemos llenar al final de la línea, después del signo igual:
|
||||
Name: El nombre original de tu idioma, por ejemplo, si queremos hacer una traducción en inglés quedará como "English", o, si en cambio es una traducción en español entonces se escribirá "Español", si es un idioma en portugués quedará como "português", y se aplica en todos los idiomas.
|
||||
Code: En este parámetro pondremos de nuevo las dos primeras letras del idioma a traducir, como lo hicimos en el primer paso de la creación del archivo.
|
||||
version: Versión del idioma. El idioma puede tener una versión, pero es más recomendable que sea la del programa.
|
||||
author: El nombre del autor que crea el idioma.
|
||||
Copyright: Es opcional, si queremos poner derechos de autor.
|
||||
E aquí terminamos con la primera sección.
|
||||
*Segunda sección.
|
||||
strings:
|
||||
En esta sección se encuentran todos los mensajes disponibles para traducir.
|
||||
Notarás que en cada línea hay un signo igual al final (=) Este signo es importante, pues sirve para identificar el mensaje original (antes de =) y el valor a ser traducido (después de =).
|
||||
Para traducir un mensaje, debes hacerlo al final de la línea que estás traduciendo, después del signo "=", siempre y cuando se respete la puntuación y las reglas del mensaje original.
|
||||
Una vez terminado, puedes mandarlo a revisión a angelitomateocedillo@gmail.com o, en cualquiera de mis aplicaciones, seleccionando la opción "errores y sugerencias" en el menú ayuda y enviando el archivo al formulario que se te redirige a tu navegador.
|
||||
Fin.
|
||||
21
ascend_910-piper/piper/notebooks/lng/translation guide.txt
Normal file
21
ascend_910-piper/piper/notebooks/lng/translation guide.txt
Normal file
@@ -0,0 +1,21 @@
|
||||
Instructions for translators
|
||||
This document is a short instruction guide that will better help you create languages and understand their syntax.
|
||||
* Create a new language:
|
||||
To create a new language, you must first make a copy of the 0.txt file, since that file is an empty translation template and we will use that template to create the posts and messages.
|
||||
Once the copy of the 0.txt file is made, we position ourselves in it and rename the file to the first two letters of your language. For example, if we want to make a French translation then we would have to put "fr" from "français", and we must bear in mind that if we want our translation to work in the program we must change the extension .txt to .lang. Before it only worked by changing the name but not the extension, but finally it has been decided that way for technical changes and for the autodetection of new languages.
|
||||
Once we have our own template, we must open it with a text editor, and we can start making our translation entries.
|
||||
As we can see, at the beginning there is a line enclosed in square brackets. These lines are called sections, and the first one is language info, a section with information about the language we are creating.
|
||||
I will explain the following lines that we must fill at the end of the line, after the equal sign:
|
||||
Name: The original name of your language, for example, if we want to make a translation in English it will remain as "English", or, if instead it is a translation in Spanish then it will be written "Español", if it is a language in Portuguese it will remain as "português", and it applies in all languages.
|
||||
Code: In this parameter we will put again the first two letters of the language to be translated, as we did in the first step of creating the file.
|
||||
version: Language version. The language may have a version, but it is recommended that it be that of the program.
|
||||
author: The name of the author who creates the language.
|
||||
Copyright: It is optional, if we want to put copyright.
|
||||
And here we end with the first section.
|
||||
*Second section.
|
||||
strings:
|
||||
In this section you will find all the messages available for translation.
|
||||
You will notice that in each line there is an equal sign at the end (=) This sign is important, as it serves to identify the original message (before =) and the value to be translated (after =).
|
||||
To translate a message, you must do it at the end of the line you are translating, after the "=" sign, as long as the punctuation and rules of the original message are respected.
|
||||
Once finished, you can send it for review to angelitomateocedillo@gmail.com or, in any of my applications, selecting the option "errors and suggestions" in the help menu and sending the file to the form that is redirected to your browser.
|
||||
End.
|
||||
558
ascend_910-piper/piper/notebooks/piper_inference_(ONNX).ipynb
Normal file
558
ascend_910-piper/piper/notebooks/piper_inference_(ONNX).ipynb
Normal file
@@ -0,0 +1,558 @@
|
||||
{
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"gpuType": "T4",
|
||||
"authorship_tag": "ABX9TyMAPvo6Syxu5wDRkSmySUxq",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"name": "python3",
|
||||
"display_name": "Python 3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"accelerator": "GPU"
|
||||
},
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "view-in-github",
|
||||
"colab_type": "text"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_inference_(ONNX).ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# <font color=\"pink\"> **[Piper](https://github.com/rhasspy/piper) inferencing notebook.**\n",
|
||||
"## \n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"- Notebook made by [rmcpantoja](http://github.com/rmcpantoja)\n",
|
||||
"- Collaborator: [Xx_Nessu_xX](https://fakeyou.com/profile/Xx_Nessu_xX)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "eK3nmYDB6C1a"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# First steps"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "9wIvcSmOby84"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#@title Install software and settings\n",
|
||||
"#@markdown The speech synthesizer and other important dependencies will be installed in this cell. But first, some settings:\n",
|
||||
"\n",
|
||||
"#@markdown #### Enhable Enhanced Accessibility?\n",
|
||||
"#@markdown This Enhanced Accessibility functionality is designed for the visually impaired, in which most of the interface can be used by voice guides.\n",
|
||||
"enhanced_accessibility = True #@param {type:\"boolean\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"\n",
|
||||
"#@markdown #### Please select your language:\n",
|
||||
"lang_select = \"English\" #@param [\"English\", \"Spanish\"]\n",
|
||||
"if lang_select == \"English\":\n",
|
||||
" lang = \"en\"\n",
|
||||
"elif lang_select == \"Spanish\":\n",
|
||||
" lang = \"es\"\n",
|
||||
"else:\n",
|
||||
" raise Exception(\"Language not supported.\")\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown #### Do you want to use the GPU for inference?\n",
|
||||
"\n",
|
||||
"#@markdown The GPU can be enabled in the edit/notebook settings menu, and this step must be done before connecting to a runtime. The GPU can lead to a higher response speed in inference, but you can use the CPU, for example, if your colab runtime to use GPU's has been ended.\n",
|
||||
"use_gpu = True #@param {type:\"boolean\"}\n",
|
||||
"\n",
|
||||
"if enhanced_accessibility:\n",
|
||||
" from google.colab import output\n",
|
||||
" guideurl = f\"https://github.com/rmcpantoja/piper/blob/master/notebooks/wav/{lang}\"\n",
|
||||
" def playaudio(filename, extension = \"wav\"):\n",
|
||||
" return output.eval_js(f'new Audio(\"{guideurl}/{filename}.{extension}?raw=true\").play()')\n",
|
||||
"\n",
|
||||
"%cd /content\n",
|
||||
"print(\"Installing...\")\n",
|
||||
"if enhanced_accessibility:\n",
|
||||
" playaudio(\"installing\")\n",
|
||||
"!git clone -q https://github.com/rmcpantoja/piper\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"#!pip install -q -r requirements.txt\n",
|
||||
"!pip install -q cython>=0.29.0 piper-phonemize==1.1.0 librosa>=0.9.2 numpy>=1.19.0 onnxruntime>=1.11.0 pytorch-lightning==1.7.0 torch==1.11.0\n",
|
||||
"!pip install -q onnxruntime-gpu\n",
|
||||
"!bash build_monotonic_align.sh\n",
|
||||
"import os\n",
|
||||
"if not os.path.exists(\"/content/piper/src/python/lng\"):\n",
|
||||
" !cp -r \"/content/piper/notebooks/lng\" /content/piper/src/python/lng\n",
|
||||
"import sys\n",
|
||||
"sys.path.append('/content/piper/notebooks')\n",
|
||||
"from translator import *\n",
|
||||
"lan = Translator()\n",
|
||||
"print(\"Checking GPU...\")\n",
|
||||
"gpu_info = !nvidia-smi\n",
|
||||
"if use_gpu and any('not found' in info for info in gpu_info[0].split(':')):\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"nogpu\")\n",
|
||||
" raise Exception(lan.translate(lang, \"The Use GPU checkbox is checked, but you don't have a GPU runtime.\"))\n",
|
||||
"elif not use_gpu and not any('not found' in info for info in gpu_info[0].split(':')):\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"gpuavailable\")\n",
|
||||
" raise Exception(lan.translate(lang, \"The Use GPU checkbox is unchecked, however you are using a GPU runtime environment. We recommend you check the checkbox to use GPU to take advantage of it.\"))\n",
|
||||
"\n",
|
||||
"if enhanced_accessibility:\n",
|
||||
" playaudio(\"installed\")\n",
|
||||
"print(\"Success!\")"
|
||||
],
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "v8b_PEtXb8co"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#@title Download your exported model\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"import os\n",
|
||||
"#@markdown #### ID or link of the voice package (tar.gz format):\n",
|
||||
"package_url_or_id = \"\" #@param {type:\"string\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"if package_url_or_id == \"\" or package_url_or_id == \"http\" or package_url_or_id == \"1\":\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"noid\")\n",
|
||||
" raise Exception(lan.translate(lang, \"Invalid link or ID!\"))\n",
|
||||
"print(\"Downloading voice package...\")\n",
|
||||
"if enhanced_accessibility:\n",
|
||||
" playaudio(\"downloading\")\n",
|
||||
"if package_url_or_id.startswith(\"1\"):\n",
|
||||
" !gdown -q \"{package_url_or_id}\" -O \"voice_package.tar.gz\"\n",
|
||||
"elif package_url_or_id.startswith(\"https://drive.google.com/file/d/\"):\n",
|
||||
" !gdown -q \"{package_url_or_id}\" -O \"voice_package.tar.gz\" --fuzzy\n",
|
||||
"else:\n",
|
||||
" !wget -q \"{package_url_or_id}\" -O \"voice_package.tar.gz\"\n",
|
||||
"if os.path.exists(\"/content/piper/src/python/voice_package.tar.gz\"):\n",
|
||||
" !tar -xf voice_package.tar.gz\n",
|
||||
" print(\"Voice package downloaded!\")\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"downloaded\")\n",
|
||||
"else:\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"dwnerror\")\n",
|
||||
" raise Exception(lan.translate(lang, \"Model failed to download!\"))"
|
||||
],
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "ykIYmVXccg6s"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# inferencing"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "MRvkYJF6g5FT"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#@title run inference\n",
|
||||
"#@markdown #### before you enjoy... Some notes!\n",
|
||||
"#@markdown * You can run the cell to download voice packs and download voices you want at any time, even if you run this cell!\n",
|
||||
"#@markdown * When you download a new voice, run this cell again and you will now be able to toggle between all the ones you download. Incredible, right?\n",
|
||||
"\n",
|
||||
"#@markdown Enjoy!!\n",
|
||||
"\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"# original: infer_onnx.py\n",
|
||||
"import json\n",
|
||||
"import logging\n",
|
||||
"import math\n",
|
||||
"import sys\n",
|
||||
"from pathlib import Path\n",
|
||||
"from enum import Enum\n",
|
||||
"from typing import Iterable, List, Optional, Union\n",
|
||||
"import numpy as np\n",
|
||||
"import onnxruntime\n",
|
||||
"from piper_train.vits.utils import audio_float_to_int16\n",
|
||||
"import glob\n",
|
||||
"import ipywidgets as widgets\n",
|
||||
"from IPython.display import display, Audio, Markdown, clear_output\n",
|
||||
"from piper_phonemize import phonemize_codepoints, phonemize_espeak, tashkeel_run\n",
|
||||
"\n",
|
||||
"_LOGGER = logging.getLogger(\"piper_train.infer_onnx\")\n",
|
||||
"\n",
|
||||
"def detect_onnx_models(path):\n",
|
||||
" onnx_models = glob.glob(path + '/*.onnx')\n",
|
||||
" if len(onnx_models) > 1:\n",
|
||||
" return onnx_models\n",
|
||||
" elif len(onnx_models) == 1:\n",
|
||||
" return onnx_models[0]\n",
|
||||
" else:\n",
|
||||
" return None\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def main():\n",
|
||||
" \"\"\"Main entry point\"\"\"\n",
|
||||
" models_path = \"/content/piper/src/python\"\n",
|
||||
" logging.basicConfig(level=logging.DEBUG)\n",
|
||||
" providers = [\n",
|
||||
" \"CPUExecutionProvider\"\n",
|
||||
" if use_gpu is False\n",
|
||||
" else (\"CUDAExecutionProvider\", {\"cudnn_conv_algo_search\": \"DEFAULT\"})\n",
|
||||
" ]\n",
|
||||
" sess_options = onnxruntime.SessionOptions()\n",
|
||||
" model = None\n",
|
||||
" onnx_models = detect_onnx_models(models_path)\n",
|
||||
" speaker_selection = widgets.Dropdown(\n",
|
||||
" options=[],\n",
|
||||
" description=f'{lan.translate(lang, \"Select speaker\")}:',\n",
|
||||
" layout={'visibility': 'hidden'}\n",
|
||||
" )\n",
|
||||
" if onnx_models is None:\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"novoices\")\n",
|
||||
" raise Exception(lan.translate(lang, \"No downloaded voice packages!\"))\n",
|
||||
" elif isinstance(onnx_models, str):\n",
|
||||
" onnx_model = onnx_models\n",
|
||||
" model, config = load_onnx(onnx_model, sess_options, providers)\n",
|
||||
" if config[\"num_speakers\"] > 1:\n",
|
||||
" speaker_selection.options = config[\"speaker_id_map\"].values()\n",
|
||||
" speaker_selection.layout.visibility = 'visible'\n",
|
||||
" preview_sid = 0\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"multispeaker\")\n",
|
||||
" else:\n",
|
||||
" speaker_selection.layout.visibility = 'hidden'\n",
|
||||
" preview_sid = None\n",
|
||||
"\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" inferencing(\n",
|
||||
" model,\n",
|
||||
" config,\n",
|
||||
" preview_sid,\n",
|
||||
" lan.translate(\n",
|
||||
" config[\"espeak\"][\"voice\"][:2],\n",
|
||||
" \"Interface openned. Write your texts, configure the different synthesis options or download all the voices you want. Enjoy!\"\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" else:\n",
|
||||
" voice_model_names = []\n",
|
||||
" for current in onnx_models:\n",
|
||||
" voice_struct = current.split(\"/\")[5]\n",
|
||||
" voice_model_names.append(voice_struct)\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"selectmodel\")\n",
|
||||
" selection = widgets.Dropdown(\n",
|
||||
" options=voice_model_names,\n",
|
||||
" description=f'{lan.translate(lang, \"Select voice package\")}:',\n",
|
||||
" )\n",
|
||||
" load_btn = widgets.Button(\n",
|
||||
" description=lan.translate(lang, \"Load it!\")\n",
|
||||
" )\n",
|
||||
" config = None\n",
|
||||
" def load_model(button):\n",
|
||||
" nonlocal config\n",
|
||||
" global onnx_model\n",
|
||||
" nonlocal model\n",
|
||||
" nonlocal models_path\n",
|
||||
" selected_voice = selection.value\n",
|
||||
" onnx_model = f\"{models_path}/{selected_voice}\"\n",
|
||||
" model, config = load_onnx(onnx_model, sess_options, providers)\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"loaded\")\n",
|
||||
" if config[\"num_speakers\"] > 1:\n",
|
||||
" speaker_selection.options = config[\"speaker_id_map\"].values()\n",
|
||||
" speaker_selection.layout.visibility = 'visible'\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"multispeaker\")\n",
|
||||
" else:\n",
|
||||
" speaker_selection.layout.visibility = 'hidden'\n",
|
||||
"\n",
|
||||
" load_btn.on_click(load_model)\n",
|
||||
" display(selection, load_btn)\n",
|
||||
" display(speaker_selection)\n",
|
||||
" speed_slider = widgets.FloatSlider(\n",
|
||||
" value=1,\n",
|
||||
" min=0.25,\n",
|
||||
" max=4,\n",
|
||||
" step=0.1,\n",
|
||||
" description=lan.translate(lang, \"Rate scale\"),\n",
|
||||
" orientation='horizontal',\n",
|
||||
" )\n",
|
||||
" noise_scale_slider = widgets.FloatSlider(\n",
|
||||
" value=0.667,\n",
|
||||
" min=0.25,\n",
|
||||
" max=4,\n",
|
||||
" step=0.1,\n",
|
||||
" description=lan.translate(lang, \"Phoneme noise scale\"),\n",
|
||||
" orientation='horizontal',\n",
|
||||
" )\n",
|
||||
" noise_scale_w_slider = widgets.FloatSlider(\n",
|
||||
" value=1,\n",
|
||||
" min=0.25,\n",
|
||||
" max=4,\n",
|
||||
" step=0.1,\n",
|
||||
" description=lan.translate(lang, \"Phoneme stressing scale\"),\n",
|
||||
" orientation='horizontal',\n",
|
||||
" )\n",
|
||||
" play = widgets.Checkbox(\n",
|
||||
" value=True,\n",
|
||||
" description=lan.translate(lang, \"Auto-play\"),\n",
|
||||
" disabled=False\n",
|
||||
" )\n",
|
||||
" text_input = widgets.Text(\n",
|
||||
" value='',\n",
|
||||
" placeholder=f'{lan.translate(lang, \"Enter your text here\")}:',\n",
|
||||
" description=lan.translate(lang, \"Text to synthesize\"),\n",
|
||||
" layout=widgets.Layout(width='80%')\n",
|
||||
" )\n",
|
||||
" synthesize_button = widgets.Button(\n",
|
||||
" description=lan.translate(lang, \"Synthesize\"),\n",
|
||||
" button_style='success', # 'success', 'info', 'warning', 'danger' or ''\n",
|
||||
" tooltip=lan.translate(lang, \"Click here to synthesize the text.\"),\n",
|
||||
" icon='check'\n",
|
||||
" )\n",
|
||||
" close_button = widgets.Button(\n",
|
||||
" description=lan.translate(lang, \"Exit\"),\n",
|
||||
" tooltip=lan.translate(lang, \"Closes this GUI.\"),\n",
|
||||
" icon='check'\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" def on_synthesize_button_clicked(b):\n",
|
||||
" if model is None:\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"nomodel\")\n",
|
||||
" raise Exception(lan.translate(lang, \"You have not loaded any model from the list!\"))\n",
|
||||
" text = text_input.value\n",
|
||||
" if config[\"num_speakers\"] > 1:\n",
|
||||
" sid = speaker_selection.value\n",
|
||||
" else:\n",
|
||||
" sid = None\n",
|
||||
" rate = speed_slider.value\n",
|
||||
" noise_scale = noise_scale_slider.value\n",
|
||||
" noise_scale_w = noise_scale_w_slider.value\n",
|
||||
" auto_play = play.value\n",
|
||||
" inferencing(model, config, sid, text, rate, noise_scale, noise_scale_w, auto_play)\n",
|
||||
"\n",
|
||||
" def on_close_button_clicked(b):\n",
|
||||
" clear_output()\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"exit\")\n",
|
||||
"\n",
|
||||
" synthesize_button.on_click(on_synthesize_button_clicked)\n",
|
||||
" close_button.on_click(on_close_button_clicked)\n",
|
||||
" display(text_input)\n",
|
||||
" display(speed_slider)\n",
|
||||
" display(noise_scale_slider)\n",
|
||||
" display(noise_scale_w_slider)\n",
|
||||
" display(play)\n",
|
||||
" display(synthesize_button)\n",
|
||||
" display(close_button)\n",
|
||||
"\n",
|
||||
"def load_onnx(model, sess_options, providers = [\"CPUExecutionProvider\"]):\n",
|
||||
" _LOGGER.debug(\"Loading model from %s\", model)\n",
|
||||
" config = load_config(model)\n",
|
||||
" model = onnxruntime.InferenceSession(\n",
|
||||
" str(model),\n",
|
||||
" sess_options=sess_options,\n",
|
||||
" providers= providers\n",
|
||||
" )\n",
|
||||
" _LOGGER.info(\"Loaded model from %s\", model)\n",
|
||||
" return model, config\n",
|
||||
"\n",
|
||||
"def load_config(model):\n",
|
||||
" with open(f\"{model}.json\", \"r\") as file:\n",
|
||||
" config = json.load(file)\n",
|
||||
" return config\n",
|
||||
"PAD = \"_\" # padding (0)\n",
|
||||
"BOS = \"^\" # beginning of sentence\n",
|
||||
"EOS = \"$\" # end of sentence\n",
|
||||
"\n",
|
||||
"class PhonemeType(str, Enum):\n",
|
||||
" ESPEAK = \"espeak\"\n",
|
||||
" TEXT = \"text\"\n",
|
||||
"\n",
|
||||
"def phonemize(config, text: str) -> List[List[str]]:\n",
|
||||
" \"\"\"Text to phonemes grouped by sentence.\"\"\"\n",
|
||||
" if config[\"phoneme_type\"] == PhonemeType.ESPEAK:\n",
|
||||
" if config[\"espeak\"][\"voice\"] == \"ar\":\n",
|
||||
" # Arabic diacritization\n",
|
||||
" # https://github.com/mush42/libtashkeel/\n",
|
||||
" text = tashkeel_run(text)\n",
|
||||
" return phonemize_espeak(text, config[\"espeak\"][\"voice\"])\n",
|
||||
" if config[\"phoneme_type\"] == PhonemeType.TEXT:\n",
|
||||
" return phonemize_codepoints(text)\n",
|
||||
" raise ValueError(f'Unexpected phoneme type: {config[\"phoneme_type\"]}')\n",
|
||||
"\n",
|
||||
"def phonemes_to_ids(config, phonemes: List[str]) -> List[int]:\n",
|
||||
" \"\"\"Phonemes to ids.\"\"\"\n",
|
||||
" id_map = config[\"phoneme_id_map\"]\n",
|
||||
" ids: List[int] = list(id_map[BOS])\n",
|
||||
" for phoneme in phonemes:\n",
|
||||
" if phoneme not in id_map:\n",
|
||||
" print(\"Missing phoneme from id map: %s\", phoneme)\n",
|
||||
" continue\n",
|
||||
" ids.extend(id_map[phoneme])\n",
|
||||
" ids.extend(id_map[PAD])\n",
|
||||
" ids.extend(id_map[EOS])\n",
|
||||
" return ids\n",
|
||||
"\n",
|
||||
"def inferencing(model, config, sid, line, length_scale = 1, noise_scale = 0.667, noise_scale_w = 0.8, auto_play=True):\n",
|
||||
" audios = []\n",
|
||||
" if config[\"phoneme_type\"] == \"PhonemeType.ESPEAK\":\n",
|
||||
" config[\"phoneme_type\"] = \"espeak\"\n",
|
||||
" text = phonemize(config, line)\n",
|
||||
" for phonemes in text:\n",
|
||||
" phoneme_ids = phonemes_to_ids(config, phonemes)\n",
|
||||
" num_speakers = config[\"num_speakers\"]\n",
|
||||
" if num_speakers == 1:\n",
|
||||
" speaker_id = None # for now\n",
|
||||
" else:\n",
|
||||
" speaker_id = sid\n",
|
||||
" text = np.expand_dims(np.array(phoneme_ids, dtype=np.int64), 0)\n",
|
||||
" text_lengths = np.array([text.shape[1]], dtype=np.int64)\n",
|
||||
" scales = np.array(\n",
|
||||
" [noise_scale, length_scale, noise_scale_w],\n",
|
||||
" dtype=np.float32,\n",
|
||||
" )\n",
|
||||
" sid = None\n",
|
||||
" if speaker_id is not None:\n",
|
||||
" sid = np.array([speaker_id], dtype=np.int64)\n",
|
||||
" audio = model.run(\n",
|
||||
" None,\n",
|
||||
" {\n",
|
||||
" \"input\": text,\n",
|
||||
" \"input_lengths\": text_lengths,\n",
|
||||
" \"scales\": scales,\n",
|
||||
" \"sid\": sid,\n",
|
||||
" },\n",
|
||||
" )[0].squeeze((0, 1))\n",
|
||||
" audio = audio_float_to_int16(audio.squeeze())\n",
|
||||
" audios.append(audio)\n",
|
||||
" merged_audio = np.concatenate(audios)\n",
|
||||
" sample_rate = config[\"audio\"][\"sample_rate\"]\n",
|
||||
" display(Markdown(f\"{line}\"))\n",
|
||||
" display(Audio(merged_audio, rate=sample_rate, autoplay=auto_play))\n",
|
||||
"\n",
|
||||
"def denoise(\n",
|
||||
" audio: np.ndarray, bias_spec: np.ndarray, denoiser_strength: float\n",
|
||||
") -> np.ndarray:\n",
|
||||
" audio_spec, audio_angles = transform(audio)\n",
|
||||
"\n",
|
||||
" a = bias_spec.shape[-1]\n",
|
||||
" b = audio_spec.shape[-1]\n",
|
||||
" repeats = max(1, math.ceil(b / a))\n",
|
||||
" bias_spec_repeat = np.repeat(bias_spec, repeats, axis=-1)[..., :b]\n",
|
||||
"\n",
|
||||
" audio_spec_denoised = audio_spec - (bias_spec_repeat * denoiser_strength)\n",
|
||||
" audio_spec_denoised = np.clip(audio_spec_denoised, a_min=0.0, a_max=None)\n",
|
||||
" audio_denoised = inverse(audio_spec_denoised, audio_angles)\n",
|
||||
"\n",
|
||||
" return audio_denoised\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def stft(x, fft_size, hopsamp):\n",
|
||||
" \"\"\"Compute and return the STFT of the supplied time domain signal x.\n",
|
||||
" Args:\n",
|
||||
" x (1-dim Numpy array): A time domain signal.\n",
|
||||
" fft_size (int): FFT size. Should be a power of 2, otherwise DFT will be used.\n",
|
||||
" hopsamp (int):\n",
|
||||
" Returns:\n",
|
||||
" The STFT. The rows are the time slices and columns are the frequency bins.\n",
|
||||
" \"\"\"\n",
|
||||
" window = np.hanning(fft_size)\n",
|
||||
" fft_size = int(fft_size)\n",
|
||||
" hopsamp = int(hopsamp)\n",
|
||||
" return np.array(\n",
|
||||
" [\n",
|
||||
" np.fft.rfft(window * x[i : i + fft_size])\n",
|
||||
" for i in range(0, len(x) - fft_size, hopsamp)\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def istft(X, fft_size, hopsamp):\n",
|
||||
" \"\"\"Invert a STFT into a time domain signal.\n",
|
||||
" Args:\n",
|
||||
" X (2-dim Numpy array): Input spectrogram. The rows are the time slices and columns are the frequency bins.\n",
|
||||
" fft_size (int):\n",
|
||||
" hopsamp (int): The hop size, in samples.\n",
|
||||
" Returns:\n",
|
||||
" The inverse STFT.\n",
|
||||
" \"\"\"\n",
|
||||
" fft_size = int(fft_size)\n",
|
||||
" hopsamp = int(hopsamp)\n",
|
||||
" window = np.hanning(fft_size)\n",
|
||||
" time_slices = X.shape[0]\n",
|
||||
" len_samples = int(time_slices * hopsamp + fft_size)\n",
|
||||
" x = np.zeros(len_samples)\n",
|
||||
" for n, i in enumerate(range(0, len(x) - fft_size, hopsamp)):\n",
|
||||
" x[i : i + fft_size] += window * np.real(np.fft.irfft(X[n]))\n",
|
||||
" return x\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def inverse(magnitude, phase):\n",
|
||||
" recombine_magnitude_phase = np.concatenate(\n",
|
||||
" [magnitude * np.cos(phase), magnitude * np.sin(phase)], axis=1\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" x_org = recombine_magnitude_phase\n",
|
||||
" n_b, n_f, n_t = x_org.shape # pylint: disable=unpacking-non-sequence\n",
|
||||
" x = np.empty([n_b, n_f // 2, n_t], dtype=np.complex64)\n",
|
||||
" x.real = x_org[:, : n_f // 2]\n",
|
||||
" x.imag = x_org[:, n_f // 2 :]\n",
|
||||
" inverse_transform = []\n",
|
||||
" for y in x:\n",
|
||||
" y_ = istft(y.T, fft_size=1024, hopsamp=256)\n",
|
||||
" inverse_transform.append(y_[None, :])\n",
|
||||
"\n",
|
||||
" inverse_transform = np.concatenate(inverse_transform, 0)\n",
|
||||
"\n",
|
||||
" return inverse_transform\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def transform(input_data):\n",
|
||||
" x = input_data\n",
|
||||
" real_part = []\n",
|
||||
" imag_part = []\n",
|
||||
" for y in x:\n",
|
||||
" y_ = stft(y, fft_size=1024, hopsamp=256).T\n",
|
||||
" real_part.append(y_.real[None, :, :]) # pylint: disable=unsubscriptable-object\n",
|
||||
" imag_part.append(y_.imag[None, :, :]) # pylint: disable=unsubscriptable-object\n",
|
||||
" real_part = np.concatenate(real_part, 0)\n",
|
||||
" imag_part = np.concatenate(imag_part, 0)\n",
|
||||
"\n",
|
||||
" magnitude = np.sqrt(real_part**2 + imag_part**2)\n",
|
||||
" phase = np.arctan2(imag_part.data, real_part.data)\n",
|
||||
"\n",
|
||||
" return magnitude, phase\n",
|
||||
"\n",
|
||||
"main()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "hcKk8M2ug8kM",
|
||||
"cellView": "form"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
}
|
||||
]
|
||||
}
|
||||
565
ascend_910-piper/piper/notebooks/piper_inference_(ckpt).ipynb
Normal file
565
ascend_910-piper/piper/notebooks/piper_inference_(ckpt).ipynb
Normal file
@@ -0,0 +1,565 @@
|
||||
{
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"gpuType": "T4",
|
||||
"authorship_tag": "ABX9TyNju0yzRK8wgAS+WgyeTEAl",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"name": "python3",
|
||||
"display_name": "Python 3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "view-in-github",
|
||||
"colab_type": "text"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_inference_(ckpt).ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# <font color=\"pink\"> **[Piper](https://github.com/rhasspy/piper) inferencing notebook.**\n",
|
||||
"## \n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"- Notebook made by [rmcpantoja](http://github.com/rmcpantoja)\n",
|
||||
"- Collaborator: [Xx_Nessu_xX](https://fakeyou.com/profile/Xx_Nessu_xX)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "eK3nmYDB6C1a"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# First steps"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "9wIvcSmOby84"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#@title Install software and settings\n",
|
||||
"#@markdown The speech synthesizer and other important dependencies will be installed in this cell. But first, some settings:\n",
|
||||
"\n",
|
||||
"#@markdown #### Enhable Enhanced Accessibility?\n",
|
||||
"#@markdown This Enhanced Accessibility functionality is designed for the visually impaired, in which most of the interface can be used by voice guides.\n",
|
||||
"enhanced_accessibility = True #@param {type:\"boolean\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"\n",
|
||||
"#@markdown #### Please select your language:\n",
|
||||
"lang_select = \"English\" #@param [\"English\", \"Spanish\"]\n",
|
||||
"if lang_select == \"English\":\n",
|
||||
" lang = \"en\"\n",
|
||||
"elif lang_select == \"Spanish\":\n",
|
||||
" lang = \"es\"\n",
|
||||
"else:\n",
|
||||
" raise Exception(\"Language not supported.\")\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown #### Do you want to use the GPU for inference?\n",
|
||||
"\n",
|
||||
"#@markdown The GPU can be enabled in the edit/notebook settings menu, and this step must be done before connecting to a runtime. The GPU can lead to a higher response speed in inference, but you can use the CPU, for example, if your colab runtime to use GPU's has been ended.\n",
|
||||
"use_gpu = False #@param {type:\"boolean\"}\n",
|
||||
"\n",
|
||||
"if enhanced_accessibility:\n",
|
||||
" from google.colab import output\n",
|
||||
" guideurl = f\"https://github.com/rmcpantoja/piper/blob/master/notebooks/wav/{lang}\"\n",
|
||||
" def playaudio(filename, extension = \"wav\"):\n",
|
||||
" return output.eval_js(f'new Audio(\"{guideurl}/{filename}.{extension}?raw=true\").play()')\n",
|
||||
"\n",
|
||||
"%cd /content\n",
|
||||
"print(\"Installing...\")\n",
|
||||
"if enhanced_accessibility:\n",
|
||||
" playaudio(\"installing\")\n",
|
||||
"!git clone -q https://github.com/rmcpantoja/piper\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"#!pip install -q -r requirements.txt\n",
|
||||
"!pip install -q cython>=0.29.0 piper-phonemize==1.1.0 librosa>=0.9.2 numpy>=1.19.0 onnxruntime>=1.11.0 pytorch-lightning==1.7.0 torch==1.11.0\n",
|
||||
"!pip install -q torchtext==0.12.0 torchvision==0.12.0\n",
|
||||
"#!pip install -q torchtext==0.14.1 torchvision==0.14.1\n",
|
||||
"# fixing recent compativility isswes:\n",
|
||||
"!pip install -q torchaudio==0.11.0 torchmetrics==0.11.4\n",
|
||||
"!bash build_monotonic_align.sh\n",
|
||||
"import os\n",
|
||||
"if not os.path.exists(\"/content/piper/src/python/lng\"):\n",
|
||||
" !cp -r \"/content/piper/notebooks/lng\" /content/piper/src/python/lng\n",
|
||||
"import sys\n",
|
||||
"sys.path.append('/content/piper/notebooks')\n",
|
||||
"from translator import *\n",
|
||||
"lan = Translator()\n",
|
||||
"print(\"Checking GPU...\")\n",
|
||||
"gpu_info = !nvidia-smi\n",
|
||||
"if use_gpu and any('not found' in info for info in gpu_info[0].split(':')):\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"nogpu\")\n",
|
||||
" raise Exception(lan.translate(lang, \"The Use GPU checkbox is checked, but you don't have a GPU runtime.\"))\n",
|
||||
"elif not use_gpu and not any('not found' in info for info in gpu_info[0].split(':')):\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"gpuavailable\")\n",
|
||||
" raise Exception(lan.translate(lang, \"The Use GPU checkbox is unchecked, however you are using a GPU runtime environment. We recommend you check the checkbox to use GPU to take advantage of it.\"))\n",
|
||||
"\n",
|
||||
"if enhanced_accessibility:\n",
|
||||
" playaudio(\"installed\")\n",
|
||||
"print(\"Success!\")"
|
||||
],
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "v8b_PEtXb8co"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#@title Download model and config\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"import os\n",
|
||||
"#@markdown #### Model ID or link (ckpt format):\n",
|
||||
"model_url_or_id = \"\" #@param {type:\"string\"}\n",
|
||||
"if model_url_or_id == \"\" or model_url_or_id == \"http\" or model_url_or_id == \"1\":\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"noid\")\n",
|
||||
" raise Exception(lan.translate(lang, \"Invalid link or ID!\"))\n",
|
||||
"print(\"Downloading model...\")\n",
|
||||
"if enhanced_accessibility:\n",
|
||||
" playaudio(\"downloading\")\n",
|
||||
"if model_url_or_id.startswith(\"1\"):\n",
|
||||
" !gdown -q \"{model_url_or_id}\"\n",
|
||||
"elif model_url_or_id.startswith(\"https://drive.google.com/file/d/\"):\n",
|
||||
" !gdown -q \"{model_url_or_id}\" --fuzzy\n",
|
||||
"else:\n",
|
||||
" !wget -q \"{model_url_or_id}\"\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown #### ID or URL of the config.json file:\n",
|
||||
"config_url_or_id = \"\" #@param {type:\"string\"}\n",
|
||||
"if config_url_or_id.startswith(\"1\"):\n",
|
||||
" !gdown -q \"{config_url_or_id}\"\n",
|
||||
"elif config_url_or_id.startswith(\"https://drive.google.com/file/d/\"):\n",
|
||||
" !gdown -q \"{config_url_or_id}\" --fuzzy\n",
|
||||
"else:\n",
|
||||
" !wget -q \"{config_url_or_id}\"\n",
|
||||
"#@markdown ---\n",
|
||||
"if enhanced_accessibility:\n",
|
||||
" playaudio(\"downloaded\")\n",
|
||||
"print(\"Voice package downloaded!\")"
|
||||
],
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "ykIYmVXccg6s"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# inferencing"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "MRvkYJF6g5FT"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#@title run inference\n",
|
||||
"#@markdown #### before you enjoy... Some notes!\n",
|
||||
"#@markdown * You can run the cell to download voice packs and download voices you want at any time, even if you run this cell!\n",
|
||||
"#@markdown * When you download a new voice, run this cell again and you will now be able to toggle between all the ones you download. Incredible, right?\n",
|
||||
"\n",
|
||||
"#@markdown Enjoy!!\n",
|
||||
"\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"# original: infer.py\n",
|
||||
"import json\n",
|
||||
"import logging\n",
|
||||
"import sys\n",
|
||||
"from pathlib import Path\n",
|
||||
"from enum import Enum\n",
|
||||
"from typing import Iterable, List, Optional, Union\n",
|
||||
"import torch\n",
|
||||
"from piper_train.vits.lightning import VitsModel\n",
|
||||
"from piper_train.vits.utils import audio_float_to_int16\n",
|
||||
"from piper_train.vits.wavfile import write as write_wav\n",
|
||||
"import numpy as np\n",
|
||||
"import glob\n",
|
||||
"import ipywidgets as widgets\n",
|
||||
"from IPython.display import display, Audio, Markdown, clear_output\n",
|
||||
"from piper_phonemize import phonemize_codepoints, phonemize_espeak, tashkeel_run\n",
|
||||
"\n",
|
||||
"_LOGGER = logging.getLogger(\"piper_train.infer_onnx\")\n",
|
||||
"\n",
|
||||
"def detect_ckpt_models(path):\n",
|
||||
" ckpt_models = glob.glob(path + '/*.ckpt')\n",
|
||||
" if len(ckpt_models) > 1:\n",
|
||||
" return ckpt_models\n",
|
||||
" elif len(ckpt_models) == 1:\n",
|
||||
" return ckpt_models[0]\n",
|
||||
" else:\n",
|
||||
" return None\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def main():\n",
|
||||
" \"\"\"Main entry point\"\"\"\n",
|
||||
" models_path = \"/content/piper/src/python\"\n",
|
||||
" logging.basicConfig(level=logging.DEBUG)\n",
|
||||
" model = None\n",
|
||||
" ckpt_models = detect_ckpt_models(models_path)\n",
|
||||
" speaker_selection = widgets.Dropdown(\n",
|
||||
" options=[],\n",
|
||||
" description=f'{lan.translate(lang, \"Select speaker\")}:',\n",
|
||||
" layout={'visibility': 'hidden'}\n",
|
||||
" )\n",
|
||||
" if ckpt_models is None:\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"novoices\")\n",
|
||||
" raise Exception(lan.translate(lang, \"No downloaded voice packages!\"))\n",
|
||||
" elif isinstance(ckpt_models, str):\n",
|
||||
" ckpt_model = ckpt_models\n",
|
||||
" model, config = load_ckpt(ckpt_model)\n",
|
||||
" if config[\"num_speakers\"] > 1:\n",
|
||||
" speaker_selection.options = config[\"speaker_id_map\"].values()\n",
|
||||
" speaker_selection.layout.visibility = 'visible'\n",
|
||||
" preview_sid = 0\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"multispeaker\")\n",
|
||||
" else:\n",
|
||||
" speaker_selection.layout.visibility = 'hidden'\n",
|
||||
" preview_sid = None\n",
|
||||
"\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" inferencing(\n",
|
||||
" model,\n",
|
||||
" config,\n",
|
||||
" preview_sid,\n",
|
||||
" lan.translate(\n",
|
||||
" config[\"espeak\"][\"voice\"][:2],\n",
|
||||
" \"Interface openned. Write your texts, configure the different synthesis options or download all the voices you want. Enjoy!\"\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" else:\n",
|
||||
" voice_model_names = []\n",
|
||||
" for current in ckpt_models:\n",
|
||||
" voice_struct = current.split(\"/\")[5]\n",
|
||||
" voice_model_names.append(voice_struct)\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"selectmodel\")\n",
|
||||
" selection = widgets.Dropdown(\n",
|
||||
" options=voice_model_names,\n",
|
||||
" description=f'{lan.translate(lang, \"Select voice package\")}:',\n",
|
||||
" )\n",
|
||||
" load_btn = widgets.Button(\n",
|
||||
" description=lan.translate(lang, \"Load it!\")\n",
|
||||
" )\n",
|
||||
" config = None\n",
|
||||
" def load_model(button):\n",
|
||||
" nonlocal config\n",
|
||||
" global ckpt_model\n",
|
||||
" nonlocal model\n",
|
||||
" nonlocal models_path\n",
|
||||
" selected_voice = selection.value\n",
|
||||
" ckpt_model = f\"{models_path}/{selected_voice}\"\n",
|
||||
" model, config = load_ckpt(onnx_model, sess_options, providers)\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"loaded\")\n",
|
||||
" if config[\"num_speakers\"] > 1:\n",
|
||||
" speaker_selection.options = config[\"speaker_id_map\"].values()\n",
|
||||
" speaker_selection.layout.visibility = 'visible'\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"multispeaker\")\n",
|
||||
" else:\n",
|
||||
" speaker_selection.layout.visibility = 'hidden'\n",
|
||||
"\n",
|
||||
" load_btn.on_click(load_model)\n",
|
||||
" display(selection, load_btn)\n",
|
||||
" display(speaker_selection)\n",
|
||||
" speed_slider = widgets.FloatSlider(\n",
|
||||
" value=1,\n",
|
||||
" min=0.25,\n",
|
||||
" max=4,\n",
|
||||
" step=0.1,\n",
|
||||
" description=lan.translate(lang, \"Rate scale\"),\n",
|
||||
" orientation='horizontal',\n",
|
||||
" )\n",
|
||||
" noise_scale_slider = widgets.FloatSlider(\n",
|
||||
" value=0.667,\n",
|
||||
" min=0.25,\n",
|
||||
" max=4,\n",
|
||||
" step=0.1,\n",
|
||||
" description=lan.translate(lang, \"Phoneme noise scale\"),\n",
|
||||
" orientation='horizontal',\n",
|
||||
" )\n",
|
||||
" noise_scale_w_slider = widgets.FloatSlider(\n",
|
||||
" value=1,\n",
|
||||
" min=0.25,\n",
|
||||
" max=4,\n",
|
||||
" step=0.1,\n",
|
||||
" description=lan.translate(lang, \"Phoneme stressing scale\"),\n",
|
||||
" orientation='horizontal',\n",
|
||||
" )\n",
|
||||
" play = widgets.Checkbox(\n",
|
||||
" value=True,\n",
|
||||
" description=lan.translate(lang, \"Auto-play\"),\n",
|
||||
" disabled=False\n",
|
||||
" )\n",
|
||||
" text_input = widgets.Text(\n",
|
||||
" value='',\n",
|
||||
" placeholder=f'{lan.translate(lang, \"Enter your text here\")}:',\n",
|
||||
" description=lan.translate(lang, \"Text to synthesize\"),\n",
|
||||
" layout=widgets.Layout(width='80%')\n",
|
||||
" )\n",
|
||||
" synthesize_button = widgets.Button(\n",
|
||||
" description=lan.translate(lang, \"Synthesize\"),\n",
|
||||
" button_style='success', # 'success', 'info', 'warning', 'danger' or ''\n",
|
||||
" tooltip=lan.translate(lang, \"Click here to synthesize the text.\"),\n",
|
||||
" icon='check'\n",
|
||||
" )\n",
|
||||
" close_button = widgets.Button(\n",
|
||||
" description=lan.translate(lang, \"Exit\"),\n",
|
||||
" tooltip=lan.translate(lang, \"Closes this GUI.\"),\n",
|
||||
" icon='check'\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" def on_synthesize_button_clicked(b):\n",
|
||||
" if model is None:\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"nomodel\")\n",
|
||||
" raise Exception(lan.translate(lang, \"You have not loaded any model from the list!\"))\n",
|
||||
" text = text_input.value\n",
|
||||
" if config[\"num_speakers\"] > 1:\n",
|
||||
" sid = speaker_selection.value\n",
|
||||
" else:\n",
|
||||
" sid = None\n",
|
||||
" rate = speed_slider.value\n",
|
||||
" noise_scale = noise_scale_slider.value\n",
|
||||
" noise_scale_w = noise_scale_w_slider.value\n",
|
||||
" auto_play = play.value\n",
|
||||
" inferencing(model, config, sid, text, rate, noise_scale, noise_scale_w, auto_play)\n",
|
||||
"\n",
|
||||
" def on_close_button_clicked(b):\n",
|
||||
" clear_output()\n",
|
||||
" if enhanced_accessibility:\n",
|
||||
" playaudio(\"exit\")\n",
|
||||
"\n",
|
||||
" synthesize_button.on_click(on_synthesize_button_clicked)\n",
|
||||
" close_button.on_click(on_close_button_clicked)\n",
|
||||
" display(text_input)\n",
|
||||
" display(speed_slider)\n",
|
||||
" display(noise_scale_slider)\n",
|
||||
" display(noise_scale_w_slider)\n",
|
||||
" display(play)\n",
|
||||
" display(synthesize_button)\n",
|
||||
" display(close_button)\n",
|
||||
"\n",
|
||||
"def load_ckpt(model):\n",
|
||||
" _LOGGER.debug(\"Loading model from %s\", model)\n",
|
||||
" config = load_config(model)\n",
|
||||
" model = VitsModel.load_from_checkpoint(str(model), dataset=None)\n",
|
||||
" # Inference only\n",
|
||||
" model.eval()\n",
|
||||
" with torch.no_grad():\n",
|
||||
" model.model_g.dec.remove_weight_norm()\n",
|
||||
"\n",
|
||||
" _LOGGER.info(\"Loaded model from %s\", model)\n",
|
||||
" return model, config\n",
|
||||
"\n",
|
||||
"def load_config(model):\n",
|
||||
" with open(\"config.json\", \"r\") as file:\n",
|
||||
" config = json.load(file)\n",
|
||||
" return config\n",
|
||||
"\n",
|
||||
"PAD = \"_\" # padding (0)\n",
|
||||
"BOS = \"^\" # beginning of sentence\n",
|
||||
"EOS = \"$\" # end of sentence\n",
|
||||
"\n",
|
||||
"class PhonemeType(str, Enum):\n",
|
||||
" ESPEAK = \"espeak\"\n",
|
||||
" TEXT = \"text\"\n",
|
||||
"\n",
|
||||
"def phonemize(config, text: str) -> List[List[str]]:\n",
|
||||
" \"\"\"Text to phonemes grouped by sentence.\"\"\"\n",
|
||||
" if config[\"phoneme_type\"] == PhonemeType.ESPEAK:\n",
|
||||
" if config[\"espeak\"][\"voice\"] == \"ar\":\n",
|
||||
" # Arabic diacritization\n",
|
||||
" # https://github.com/mush42/libtashkeel/\n",
|
||||
" text = tashkeel_run(text)\n",
|
||||
" return phonemize_espeak(text, config[\"espeak\"][\"voice\"])\n",
|
||||
" if config[\"phoneme_type\"] == PhonemeType.TEXT:\n",
|
||||
" return phonemize_codepoints(text)\n",
|
||||
" raise ValueError(f\"Unexpected phoneme type: {self.config.phoneme_type}\")\n",
|
||||
"\n",
|
||||
"def phonemes_to_ids(config, phonemes: List[str]) -> List[int]:\n",
|
||||
" \"\"\"Phonemes to ids.\"\"\"\n",
|
||||
" id_map = config[\"phoneme_id_map\"]\n",
|
||||
" ids: List[int] = list(id_map[BOS])\n",
|
||||
" for phoneme in phonemes:\n",
|
||||
" if phoneme not in id_map:\n",
|
||||
" print(\"Missing phoneme from id map: %s\", phoneme)\n",
|
||||
" continue\n",
|
||||
" ids.extend(id_map[phoneme])\n",
|
||||
" ids.extend(id_map[PAD])\n",
|
||||
" ids.extend(id_map[EOS])\n",
|
||||
" return ids\n",
|
||||
"\n",
|
||||
"def inferencing(model, config, sid, line, length_scale = 1, noise_scale = 0.667, noise_scale_w = 0.8, auto_play=True):\n",
|
||||
" audios = []\n",
|
||||
" text = phonemize(config, line)\n",
|
||||
" for phonemes in text:\n",
|
||||
" phoneme_ids = phonemes_to_ids(config, phonemes)\n",
|
||||
" num_speakers = config[\"num_speakers\"]\n",
|
||||
" if num_speakers == 1:\n",
|
||||
" speaker_id = None # for now\n",
|
||||
" else:\n",
|
||||
" speaker_id = sid\n",
|
||||
" text = torch.LongTensor(phoneme_ids).unsqueeze(0)\n",
|
||||
" text_lengths = torch.LongTensor([len(phoneme_ids)])\n",
|
||||
" scales = [\n",
|
||||
" noise_scale,\n",
|
||||
" length_scale,\n",
|
||||
" noise_scale_w\n",
|
||||
" ]\n",
|
||||
" sid = torch.LongTensor([speaker_id]) if speaker_id is not None else None\n",
|
||||
" audio = model(\n",
|
||||
" text,\n",
|
||||
" text_lengths,\n",
|
||||
" scales,\n",
|
||||
" sid=sid\n",
|
||||
" ).detach().numpy()\n",
|
||||
" audio = audio_float_to_int16(audio.squeeze())\n",
|
||||
" audios.append(audio)\n",
|
||||
" merged_audio = np.concatenate(audios)\n",
|
||||
" sample_rate = config[\"audio\"][\"sample_rate\"]\n",
|
||||
" display(Markdown(f\"{line}\"))\n",
|
||||
" display(Audio(merged_audio, rate=sample_rate, autoplay=auto_play))\n",
|
||||
"\n",
|
||||
"def denoise(\n",
|
||||
" audio: np.ndarray, bias_spec: np.ndarray, denoiser_strength: float\n",
|
||||
") -> np.ndarray:\n",
|
||||
" audio_spec, audio_angles = transform(audio)\n",
|
||||
"\n",
|
||||
" a = bias_spec.shape[-1]\n",
|
||||
" b = audio_spec.shape[-1]\n",
|
||||
" repeats = max(1, math.ceil(b / a))\n",
|
||||
" bias_spec_repeat = np.repeat(bias_spec, repeats, axis=-1)[..., :b]\n",
|
||||
"\n",
|
||||
" audio_spec_denoised = audio_spec - (bias_spec_repeat * denoiser_strength)\n",
|
||||
" audio_spec_denoised = np.clip(audio_spec_denoised, a_min=0.0, a_max=None)\n",
|
||||
" audio_denoised = inverse(audio_spec_denoised, audio_angles)\n",
|
||||
"\n",
|
||||
" return audio_denoised\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def stft(x, fft_size, hopsamp):\n",
|
||||
" \"\"\"Compute and return the STFT of the supplied time domain signal x.\n",
|
||||
" Args:\n",
|
||||
" x (1-dim Numpy array): A time domain signal.\n",
|
||||
" fft_size (int): FFT size. Should be a power of 2, otherwise DFT will be used.\n",
|
||||
" hopsamp (int):\n",
|
||||
" Returns:\n",
|
||||
" The STFT. The rows are the time slices and columns are the frequency bins.\n",
|
||||
" \"\"\"\n",
|
||||
" window = np.hanning(fft_size)\n",
|
||||
" fft_size = int(fft_size)\n",
|
||||
" hopsamp = int(hopsamp)\n",
|
||||
" return np.array(\n",
|
||||
" [\n",
|
||||
" np.fft.rfft(window * x[i : i + fft_size])\n",
|
||||
" for i in range(0, len(x) - fft_size, hopsamp)\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def istft(X, fft_size, hopsamp):\n",
|
||||
" \"\"\"Invert a STFT into a time domain signal.\n",
|
||||
" Args:\n",
|
||||
" X (2-dim Numpy array): Input spectrogram. The rows are the time slices and columns are the frequency bins.\n",
|
||||
" fft_size (int):\n",
|
||||
" hopsamp (int): The hop size, in samples.\n",
|
||||
" Returns:\n",
|
||||
" The inverse STFT.\n",
|
||||
" \"\"\"\n",
|
||||
" fft_size = int(fft_size)\n",
|
||||
" hopsamp = int(hopsamp)\n",
|
||||
" window = np.hanning(fft_size)\n",
|
||||
" time_slices = X.shape[0]\n",
|
||||
" len_samples = int(time_slices * hopsamp + fft_size)\n",
|
||||
" x = np.zeros(len_samples)\n",
|
||||
" for n, i in enumerate(range(0, len(x) - fft_size, hopsamp)):\n",
|
||||
" x[i : i + fft_size] += window * np.real(np.fft.irfft(X[n]))\n",
|
||||
" return x\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def inverse(magnitude, phase):\n",
|
||||
" recombine_magnitude_phase = np.concatenate(\n",
|
||||
" [magnitude * np.cos(phase), magnitude * np.sin(phase)], axis=1\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" x_org = recombine_magnitude_phase\n",
|
||||
" n_b, n_f, n_t = x_org.shape # pylint: disable=unpacking-non-sequence\n",
|
||||
" x = np.empty([n_b, n_f // 2, n_t], dtype=np.complex64)\n",
|
||||
" x.real = x_org[:, : n_f // 2]\n",
|
||||
" x.imag = x_org[:, n_f // 2 :]\n",
|
||||
" inverse_transform = []\n",
|
||||
" for y in x:\n",
|
||||
" y_ = istft(y.T, fft_size=1024, hopsamp=256)\n",
|
||||
" inverse_transform.append(y_[None, :])\n",
|
||||
"\n",
|
||||
" inverse_transform = np.concatenate(inverse_transform, 0)\n",
|
||||
"\n",
|
||||
" return inverse_transform\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def transform(input_data):\n",
|
||||
" x = input_data\n",
|
||||
" real_part = []\n",
|
||||
" imag_part = []\n",
|
||||
" for y in x:\n",
|
||||
" y_ = stft(y, fft_size=1024, hopsamp=256).T\n",
|
||||
" real_part.append(y_.real[None, :, :]) # pylint: disable=unsubscriptable-object\n",
|
||||
" imag_part.append(y_.imag[None, :, :]) # pylint: disable=unsubscriptable-object\n",
|
||||
" real_part = np.concatenate(real_part, 0)\n",
|
||||
" imag_part = np.concatenate(imag_part, 0)\n",
|
||||
"\n",
|
||||
" magnitude = np.sqrt(real_part**2 + imag_part**2)\n",
|
||||
" phase = np.arctan2(imag_part.data, real_part.data)\n",
|
||||
"\n",
|
||||
" return magnitude, phase\n",
|
||||
"\n",
|
||||
"main()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "hcKk8M2ug8kM",
|
||||
"cellView": "form"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# I suggest you export this model to use it in piper!\n",
|
||||
"\n",
|
||||
"Well, have you tried it yet? What do you think about it? If both answers are acceptable, it's time to disconnect your session in this notebook and [export this model](https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_model_exporter.ipynb)!"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "NPow8JT7R0WM"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
209
ascend_910-piper/piper/notebooks/piper_model_exporter.ipynb
Normal file
209
ascend_910-piper/piper/notebooks/piper_model_exporter.ipynb
Normal file
@@ -0,0 +1,209 @@
|
||||
{
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"gpuType": "T4",
|
||||
"authorship_tag": "ABX9TyPKhrhJQuxhFJG2C1A+aMsQ",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"name": "python3",
|
||||
"display_name": "Python 3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "view-in-github",
|
||||
"colab_type": "text"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_model_exporter.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# [Piper](https://github.com/rhasspy/piper) model exporter\n",
|
||||
"## \n",
|
||||
"\n",
|
||||
"Notebook created by [rmcpantoja](http://github.com/rmcpantoja)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "EOL-kjplZYEU"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "FfMKr8v2RVOm"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@title Install software\n",
|
||||
"\n",
|
||||
"print(\"Installing...\")\n",
|
||||
"!git clone -q https://github.com/rhasspy/piper\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"!pip install -q cython>=0.29.0 espeak-phonemizer>=1.1.0 librosa>=0.9.2 numpy>=1.19.0 pytorch-lightning~=1.7.0 torch~=1.11.0\n",
|
||||
"!pip install -q onnx onnxruntime-gpu\n",
|
||||
"!bash build_monotonic_align.sh\n",
|
||||
"!apt-get install espeak-ng\n",
|
||||
"!pip install -q torchtext==0.12.0\n",
|
||||
"# fixing recent compativility isswes:\n",
|
||||
"!pip install -q torchaudio==0.11.0 torchmetrics==0.11.4\n",
|
||||
"print(\"Done!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#@title Voice package generation section\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"import os\n",
|
||||
"import ipywidgets as widgets\n",
|
||||
"from IPython.display import display\n",
|
||||
"import json\n",
|
||||
"from google.colab import output\n",
|
||||
"guideurl = \"https://github.com/rmcpantoja/piper/blob/master/notebooks/wav/en\"\n",
|
||||
"#@markdown #### Download:\n",
|
||||
"#@markdown **Drive ID or direct download link of the model in another cloud:**\n",
|
||||
"model_id = \"\" #@param {type:\"string\"}\n",
|
||||
"#@markdown **Drive ID or direct download link of the config.json file:**\n",
|
||||
"config_id = \"\" #@param {type:\"string\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"\n",
|
||||
"#@markdown #### Creation process:\n",
|
||||
"#@markdown **Choose the language code (iso639-1 format):**\n",
|
||||
"\n",
|
||||
"#@markdown You can see a list of language codes and names [here](https://www.loc.gov/standards/iso639-2/php/English_list.php)\n",
|
||||
"\n",
|
||||
"language = \"en-us\" #@param [\"ca\", \"da\", \"de\", \"en\", \"en-us\", \"es\", \"fi\", \"fr\", \"grc\", \"is\", \"it\", \"k\", \"nb\", \"ne\", \"nl\", \"pl\", \"pt-br\", \"ru\", \"sv\", \"uk\", \"vi-vn-x-central\", \"yue\"]\n",
|
||||
"voice_name = \"Myvoice\" #@param {type:\"string\"}\n",
|
||||
"voice_name = voice_name.lower()\n",
|
||||
"quality = \"medium\" #@param [\"high\", \"low\", \"medium\", \"x-low\"]\n",
|
||||
"def start_process():\n",
|
||||
" if not os.path.exists(\"/content/project/model.ckpt\"):\n",
|
||||
" raise Exception(\"Could not download model! make sure the file is shareable to everyone\")\n",
|
||||
" output.eval_js(f'new Audio(\"{guideurl}/starting.wav?raw=true\").play()')\n",
|
||||
" !python -m piper_train.export_onnx \"/content/project/model.ckpt\" \"{export_voice_path}/{export_voice_name}.onnx\"\n",
|
||||
" print(\"compressing...\")\n",
|
||||
" !tar -czvf \"{packages_path}/voice-{export_voice_name}.tar.gz\" -C \"{export_voice_path}\" .\n",
|
||||
" output.eval_js(f'new Audio(\"{guideurl}/success.wav?raw=true\").play()')\n",
|
||||
" print(\"Done!\")\n",
|
||||
"\n",
|
||||
"export_voice_name = f\"{language}-{voice_name}-{quality}\"\n",
|
||||
"export_voice_path = \"/content/project/voice-\"+export_voice_name\n",
|
||||
"packages_path = \"/content/project/packages\"\n",
|
||||
"if not os.path.exists(export_voice_path):\n",
|
||||
" os.makedirs(export_voice_path)\n",
|
||||
"if not os.path.exists(packages_path):\n",
|
||||
" os.makedirs(packages_path)\n",
|
||||
"print(\"Downloading model and his config...\")\n",
|
||||
"if model_id.startswith(\"1\"):\n",
|
||||
" !gdown -q \"{model_id}\" -O /content/project/model.ckpt\n",
|
||||
"elif model_id.startswith(\"https://drive.google.com/file/d/\"):\n",
|
||||
" !gdown -q \"{model_id}\" -O \"/content/project/model.ckpt\" --fuzzy\n",
|
||||
"else:\n",
|
||||
" !wget \"{model_id}\" -O \"/content/project/model.ckpt\"\n",
|
||||
"if config_id.startswith(\"1\"):\n",
|
||||
" !gdown -q \"{config_id}\" -O \"{export_voice_path}/{export_voice_name}.onnx.json\"\n",
|
||||
"elif config_id.startswith(\"https://drive.google.com/file/d/\"):\n",
|
||||
" !gdown -q \"{config_id}\" -O \"{export_voice_path}/{export_voice_name}.onnx.json\" --fuzzy\n",
|
||||
"else:\n",
|
||||
" !wget \"{config_id}\" -O \"{export_voice_path}/{export_voice_name}.onnx.json\"\n",
|
||||
"#@markdown **Do you want to write a model card?**\n",
|
||||
"write_model_card = False #@param {type:\"boolean\"}\n",
|
||||
"if write_model_card:\n",
|
||||
" with open(f\"{export_voice_path}/{export_voice_name}.onnx.json\", \"r\") as file:\n",
|
||||
" config = json.load(file)\n",
|
||||
" sample_rate = config[\"audio\"][\"sample_rate\"]\n",
|
||||
" num_speakers = config[\"num_speakers\"]\n",
|
||||
" output.eval_js(f'new Audio(\"{guideurl}/waiting.wav?raw=true\").play()')\n",
|
||||
" text_area = widgets.Textarea(\n",
|
||||
" description = \"fill in this following template and press start to generate the voice package\",\n",
|
||||
" value=f'# Model card for {voice_name} ({quality})\\n\\n* Language: {language} (normaliced)\\n* Speakers: {num_speakers}\\n* Quality: {quality}\\n* Samplerate: {sample_rate}Hz\\n\\n## Dataset\\n\\n* URL: \\n* License: \\n\\n## Training\\n\\nTrained from scratch.\\nOr finetuned from: ',\n",
|
||||
" layout=widgets.Layout(width='500px', height='200px')\n",
|
||||
" )\n",
|
||||
" button = widgets.Button(description='Start')\n",
|
||||
"\n",
|
||||
" def create_model_card(button):\n",
|
||||
" model_card_text = text_area.value.strip()\n",
|
||||
" with open(f'{export_voice_path}/MODEL_CARD', 'w') as file:\n",
|
||||
" file.write(model_card_text)\n",
|
||||
" text_area.close()\n",
|
||||
" button.close()\n",
|
||||
" output.clear()\n",
|
||||
" start_process()\n",
|
||||
"\n",
|
||||
" button.on_click(create_model_card)\n",
|
||||
"\n",
|
||||
" display(text_area, button)\n",
|
||||
"else:\n",
|
||||
" start_process()"
|
||||
],
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "PqcoBb26V5xA"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#@title Download/export your generated voice package\n",
|
||||
"\n",
|
||||
"#@markdown #### How do you want to export your model?\n",
|
||||
"export_mode = \"upload it to my Google Drive\" #@param [\"Download the voice package on my device (may take some time)\", \"upload it to my Google Drive\"]\n",
|
||||
"print(\"Exporting package...\")\n",
|
||||
"if export_mode == \"Download the voice package on my device (may take some time)\":\n",
|
||||
" from google.colab import files\n",
|
||||
" files.download(f\"{packages_path}/voice-{export_voice_name}.tar.gz\")\n",
|
||||
" msg = \"Please wait a moment while the package is being downloaded.\"\n",
|
||||
"else:\n",
|
||||
" voicepacks_folder = \"/content/drive/MyDrive/piper voice packages\"\n",
|
||||
" from google.colab import drive\n",
|
||||
" drive.mount('/content/drive')\n",
|
||||
" if not os.path.exists(voicepacks_folder):\n",
|
||||
" os.makedirs(voicepacks_folder)\n",
|
||||
" !cp \"{packages_path}/voice-{export_voice_name}.tar.gz\" \"{voicepacks_folder}\"\n",
|
||||
" msg = f\"You can find the generated voice package at: {voicepacks_folder}.\"\n",
|
||||
"print(f\"Done! {msg}\")"
|
||||
],
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "Hu3V9CJeWc4Y"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# I want to test this model! I don't need anything else anymore?\n",
|
||||
"\n",
|
||||
"No, this is almost the end! Now you can share your generated package to your friends, upload to a cloud storage and/or test it on:\n",
|
||||
"* [The inference notebook](https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_inference_(ONNX).ipynb)\n",
|
||||
" * run the cells in order for it to work correctly, as well as all the notebooks. Also, the inference notebook will guide you through the process using the enhanced accessibility feature if you wish. It's easy to use. Test it!\n",
|
||||
"* Or through the NVDA screen reader!\n",
|
||||
" * Download and install the latest version of the [add-on](https://github.com/mush42/piper-nvda/releases).\n",
|
||||
" * Once the plugin is installed, go to NVDA menu/preferences/settings... and look for the `Piper Voice Manager` category.\n",
|
||||
" * Tab until you find the `Install from local file` button, press enter and select the generated package in your downloads.\n",
|
||||
" * Once the package is selected and installed, apply the changes and restart NVDA to update the voice list.\n",
|
||||
"* Enjoy your creation!"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "IRiNBHkeoDbC"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,458 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "view-in-github",
|
||||
"colab_type": "text"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_multilenguaje_cuaderno_de_entrenamiento.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "eK3nmYDB6C1a"
|
||||
},
|
||||
"source": [
|
||||
"# <font color=\"pink\"> **Cuaderno de entrenamiento de [Piper.](https://github.com/rhasspy/piper)**\n",
|
||||
"## \n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"- Cuaderno creado por [rmcpantoja](http://github.com/rmcpantoja)\n",
|
||||
"- Colaborador y traductor: [Xx_Nessu_xX](https://fakeyou.com/profile/Xx_Nessu_xX)\n",
|
||||
"- [Cuaderno original inglés.](https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_multilingual_training_notebook.ipynb#scrollTo=dOyx9Y6JYvRF)\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"# Notas:\n",
|
||||
"\n",
|
||||
"- <font color=\"orange\">**Las cosas en naranja significa que son importantes.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "AICh6p5OJybj"
|
||||
},
|
||||
"source": [
|
||||
"# <font color=\"pink\">🔧 ***Primeros pasos.*** 🔧"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "qyxSMuzjfQrz"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown ## <font color=\"pink\"> **Google Colab Anti-Disconnect.** 🔌\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown #### Evita la desconexión automática. Aún así, se desconectará después de <font color=\"orange\">**6 a 12 horas**</font>.\n",
|
||||
"\n",
|
||||
"import IPython\n",
|
||||
"js_code = '''\n",
|
||||
"function ClickConnect(){\n",
|
||||
"console.log(\"Working\");\n",
|
||||
"document.querySelector(\"colab-toolbar-button#connect\").click()\n",
|
||||
"}\n",
|
||||
"setInterval(ClickConnect,60000)\n",
|
||||
"'''\n",
|
||||
"display(IPython.display.Javascript(js_code))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "ygxzp-xHTC7T"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown ## <font color=\"pink\"> **Comprueba la GPU.** 👁️\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown #### Una GPU de mayor capacidad puede aumentar la velocidad de entrenamiento. Por defecto, tendrás una <font color=\"orange\">**Tesla T4**</font>.\n",
|
||||
"!nvidia-smi"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "sUNjId07JfAK"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **Monta tu Google Drive.** 📂\n",
|
||||
"from google.colab import drive\n",
|
||||
"drive.mount('/content/drive', force_remount=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "_XwmTVlcUgCh"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **Instalar software.** 📦\n",
|
||||
"\n",
|
||||
"#@markdown ####En esta celda se instalará el sintetizador y sus dependencias necesarias para ejecutar el entrenamiento. (Esto puede llevar un rato.)\n",
|
||||
"\n",
|
||||
"#@markdown #### <font color=\"orange\">**¿Quieres usar el parche?**\n",
|
||||
"#@markdown El parche ofrece la posibilidad de exportar archivos de audio a la carpeta de salida y guardar un único modelo durante el entrenamiento.\n",
|
||||
"usepatch = True #@param {type:\"boolean\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"# clone:\n",
|
||||
"!git clone -q https://github.com/rmcpantoja/piper\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"!wget -q \"https://raw.githubusercontent.com/coqui-ai/TTS/dev/TTS/bin/resample.py\"\n",
|
||||
"!pip install -q -r requirements.txt\n",
|
||||
"!pip install -q torchtext==0.12.0\n",
|
||||
"!pip install -q torchvision==0.12.0\n",
|
||||
"# fixing recent compativility isswes:\n",
|
||||
"!pip install -q torchaudio==0.11.0 torchmetrics==0.11.4\n",
|
||||
"!bash build_monotonic_align.sh\n",
|
||||
"!apt-get install -q espeak-ng\n",
|
||||
"# download patches:\n",
|
||||
"print(\"Downloading patch...\")\n",
|
||||
"!gdown -q \"1EWEb7amo1rgFGpBFfRD4BKX3pkjVK1I-\" -O \"/content/piper/src/python/patch.zip\"\n",
|
||||
"!unzip -o -q \"patch.zip\"\n",
|
||||
"%cd /content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "A3bMzEE0V5Ma"
|
||||
},
|
||||
"source": [
|
||||
"# <font color=\"pink\"> 🤖 ***Entrenamiento.*** 🤖"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "SvEGjf0aV8eg"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **1. Extraer dataset.** 📥\n",
|
||||
"#@markdown ####Importante: los audios deben estar en formato <font color=\"orange\">**wav, (16000 o 22050hz, 16-bits, mono), y, por comodidad, numerados. Ejemplo:**\n",
|
||||
"\n",
|
||||
"#@markdown * <font color=\"orange\">**1.wav**</font>\n",
|
||||
"#@markdown * <font color=\"orange\">**2.wav**</font>\n",
|
||||
"#@markdown * <font color=\"orange\">**3.wav**</font>\n",
|
||||
"#@markdown * <font color=\"orange\">**.....**</font>\n",
|
||||
"\n",
|
||||
"#@markdown ---\n",
|
||||
"\n",
|
||||
"%cd /content\n",
|
||||
"!mkdir /content/dataset\n",
|
||||
"%cd /content/dataset\n",
|
||||
"!mkdir /content/dataset/wavs\n",
|
||||
"#@markdown ### Ruta del dataset para descomprimir:\n",
|
||||
"zip_path = \"/content/drive/MyDrive/wavs.zip\" #@param {type:\"string\"}\n",
|
||||
"!unzip \"{zip_path}\" -d /content/dataset/wavs\n",
|
||||
"#@markdown ---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "E0W0OCvXXvue"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **2. Cargar el archivo de transcripción.** 📝\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ####<font color=\"orange\">**Importante: la transcripción significa escribir lo que dice el personaje en cada uno de los audios, y debe tener la siguiente estructura:**\n",
|
||||
"\n",
|
||||
"#@markdown ##### <font color=\"orange\">Para un conjunto de datos de un solo hablante:\n",
|
||||
"#@markdown * wavs/1.wav|Esto dice el personaje en el audio 1.\n",
|
||||
"#@markdown * wavs/2.wav|Este, el texto que dice el personaje en el audio 2.\n",
|
||||
"#@markdown * ...\n",
|
||||
"\n",
|
||||
"#@markdown ##### <font color=\"orange\">Para un conjunto de datos de varios hablantes:\n",
|
||||
"\n",
|
||||
"#@markdown * wavs/speaker1audio1.wav|speaker1|Esto es lo que dice el primer hablante.\n",
|
||||
"#@markdown * wavs/speaker1audio2.wav|speaker1|Este es otro audio del primer hablante.\n",
|
||||
"#@markdown * wavs/speaker2audio1.wav|speaker2|Esto es lo que dice el segundo hablante en el primer audio.\n",
|
||||
"#@markdown * wavs/speaker2audio2.wav|speaker2|Este es otro audio del segundo hablante.\n",
|
||||
"#@markdown * ...\n",
|
||||
"\n",
|
||||
"#@markdown #### Y así sucesivamente. Además, la transcripción debe estar en formato <font color=\"orange\">**.csv (UTF-8 sin BOM)**\n",
|
||||
"#@markdown ---\n",
|
||||
"%cd /content/dataset\n",
|
||||
"from google.colab import files\n",
|
||||
"!rm /content/dataset/metadata.csv\n",
|
||||
"listfn, length = files.upload().popitem()\n",
|
||||
"if listfn != \"metadata.csv\":\n",
|
||||
" !mv \"$listfn\" metadata.csv\n",
|
||||
"%cd .."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "dOyx9Y6JYvRF"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **3. Preprocesar el dataset.** 🔄\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"#@markdown ### En primer lugar, seleccione el idioma de su conjunto de datos. <br> (Está disponible para español los siguientes: Español y Español lationamericano.)\n",
|
||||
"language = \"Español\" #@param [\"Català\", \"Dansk\", \"Deutsch\", \"Ελληνικά\", \"English (British)\", \"English (U.S.)\", \"Español\", \"Español (latinoamericano)\", \"Suomi\", \"Français\", \"ქართული\", \"hindy\", \"Icelandic\", \"Italiano\", \"қазақша\", \"नेपाली\", \"Nederlands\", \"Norsk\", \"Polski\", \"Português (Brasil)\", \"Русский\", \"Svenska\", \"украї́нська\", \"Tiếng Việt\", \"简体中文\"]\n",
|
||||
"#@markdown ---\n",
|
||||
"# language definition:\n",
|
||||
"languages = {\n",
|
||||
" \"Català\": \"ca\",\n",
|
||||
" \"Dansk\": \"da\",\n",
|
||||
" \"Deutsch\": \"de\",\n",
|
||||
" \"Ελληνικά\": \"grc\",\n",
|
||||
" \"English (British)\": \"en\",\n",
|
||||
" \"English (U.S.)\": \"en-us\",\n",
|
||||
" \"Español\": \"es\",\n",
|
||||
" \"Español (latinoamericano)\": \"es-419\",\n",
|
||||
" \"Suomi\": \"fi\",\n",
|
||||
" \"Français\": \"fr\",\n",
|
||||
" \"hindy\": \"hi\",\n",
|
||||
" \"Icelandic\": \"is\",\n",
|
||||
" \"Italiano\": \"it\",\n",
|
||||
" \"ქართული\": \"ka\",\n",
|
||||
" \"қазақша\": \"kk\",\n",
|
||||
" \"नेपाली\": \"ne\",\n",
|
||||
" \"Nederlands\": \"nl\",\n",
|
||||
" \"Norsk\": \"nb\",\n",
|
||||
" \"Polski\": \"pl\",\n",
|
||||
" \"Português (Brasil)\": \"pt-br\",\n",
|
||||
" \"Русский\": \"ru\",\n",
|
||||
" \"Svenska\": \"sv\",\n",
|
||||
" \"украї́нська\": \"uk\",\n",
|
||||
" \"Tiếng Việt\": \"vi-vn-x-central\",\n",
|
||||
" \"简体中文\": \"yue\"\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"def _get_language(code):\n",
|
||||
" return languages[code]\n",
|
||||
"\n",
|
||||
"final_language = _get_language(language)\n",
|
||||
"#@markdown ### Elige un nombre para tu modelo:\n",
|
||||
"model_name = \"Test\" #@param {type:\"string\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"# output:\n",
|
||||
"#@markdown ###Elige la carpeta de trabajo: (se recomienda guardar en Drive)\n",
|
||||
"\n",
|
||||
"#@markdown La carpeta de trabajo se utilizará en el preprocesamiento, pero también en el entrenamiento del modelo.\n",
|
||||
"output_path = \"/content/drive/MyDrive/colab/piper\" #@param {type:\"string\"}\n",
|
||||
"output_dir = output_path+\"/\"+model_name\n",
|
||||
"if not os.path.exists(output_dir):\n",
|
||||
" os.makedirs(output_dir)\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### Elige el formato del dataset:\n",
|
||||
"dataset_format = \"ljspeech\" #@param [\"ljspeech\", \"mycroft\"]\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### ¿Se trata de un conjunto de datos de un solo hablante? Si no es así, desmarca la casilla:\n",
|
||||
"single_speaker = True #@param {type:\"boolean\"}\n",
|
||||
"if single_speaker:\n",
|
||||
" force_sp = \" --single-speaker\"\n",
|
||||
"else:\n",
|
||||
" force_sp = \"\"\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ###Seleccione la frecuencia de muestreo del dataset:\n",
|
||||
"sample_rate = \"22050\" #@param [\"16000\", \"22050\"]\n",
|
||||
"#@markdown ---\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"#@markdown ###¿Quieres entrenar utilizando esta frecuencia de muestreo, pero tus audios no la tienen?\n",
|
||||
"#@markdown ¡El remuestreador te ayuda a hacerlo rápidamente!\n",
|
||||
"resample = False #@param {type:\"boolean\"}\n",
|
||||
"if resample:\n",
|
||||
" !python resample.py --input_dir \"/content/dataset/wavs\" --output_dir \"/content/dataset/wavs_resampled\" --output_sr {sample_rate} --file_ext \"wav\"\n",
|
||||
" !mv /content/dataset/wavs_resampled/* /content/dataset/wavs\n",
|
||||
"#@markdown ---\n",
|
||||
"\n",
|
||||
"!python -m piper_train.preprocess \\\n",
|
||||
" --language {final_language} \\\n",
|
||||
" --input-dir /content/dataset \\\n",
|
||||
" --output-dir \"{output_dir}\" \\\n",
|
||||
" --dataset-format {dataset_format} \\\n",
|
||||
" --sample-rate {sample_rate} \\\n",
|
||||
" {force_sp}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "ickQlOCRjkBL"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **4. Ajustes.** 🧰\n",
|
||||
"import json\n",
|
||||
"import ipywidgets as widgets\n",
|
||||
"from IPython.display import display\n",
|
||||
"from google.colab import output\n",
|
||||
"import os\n",
|
||||
"#@markdown ### Seleccione la acción para entrenar este conjunto de datos:\n",
|
||||
"\n",
|
||||
"#@markdown * La opción de continuar un entrenamiento se explica por sí misma. Si has entrenado previamente un modelo con colab gratuito, se te ha acabado el tiempo y estás considerando entrenarlo un poco más, esto es ideal para ti. Sólo tienes que establecer los mismos ajustes que estableciste cuando entrenaste este modelo por primera vez.\n",
|
||||
"#@markdown * La opción para convertir un modelo de un solo hablante en un modelo multihablante se explica por sí misma, y para ello es importante que hayas procesado un conjunto de datos que contenga texto y audio de todos los posibles hablantes que quieras entrenar en tu modelo.\n",
|
||||
"#@markdown * La opción finetune se utiliza para entrenar un conjunto de datos utilizando un modelo preentrenado, es decir, entrenar sobre esos datos. Esta opción es ideal si desea entrenar un conjunto de datos muy pequeño (se recomiendan más de cinco minutos).\n",
|
||||
"#@markdown * La opción entrenar desde cero construye características como el diccionario y la forma del habla desde cero, y esto puede tardar más en converger. Para ello, se recomiendan horas de audio (8 como mínimo) que tengan una gran colección de fonemas.\n",
|
||||
"action = \"finetune\" #@param [\"Continue training\", \"convert single-speaker to multi-speaker model\", \"finetune\", \"train from scratch\"]\n",
|
||||
"#@markdown ---\n",
|
||||
"if action == \"Continue training\":\n",
|
||||
" if os.path.exists(f\"{output_dir}/lightning_logs/version_0/checkpoints/last.ckpt\"):\n",
|
||||
" ft_command = f'--resume_from_checkpoint \"{output_dir}/lightning_logs/version_0/checkpoints/last.ckpt\" '\n",
|
||||
" print(f\"Continuing {model_name}'s training at: {output_dir}/lightning_logs/version_0/checkpoints/last.ckpt\")\n",
|
||||
" else:\n",
|
||||
" raise Exception(\"Training cannot be continued as there is no checkpoint to continue at.\")\n",
|
||||
"elif action == \"finetune\":\n",
|
||||
" if os.path.exists(f\"{output_dir}/lightning_logs/version_0/checkpoints/last.ckpt\"):\n",
|
||||
" raise Exception(\"Oh no! You have already trained this model before, you cannot choose this option since your progress will be lost, and then your previous time will not count. Please select the option to continue a training.\")\n",
|
||||
" else:\n",
|
||||
" ft_command = '--resume_from_checkpoint \"/content/pretrained.ckpt\" '\n",
|
||||
"elif action == \"convert single-speaker to multi-speaker model\":\n",
|
||||
" if not single_speaker:\n",
|
||||
" ft_command = '--resume_from_single_speaker_checkpoint \"/content/pretrained.ckpt\" '\n",
|
||||
" else:\n",
|
||||
" raise Exception(\"This dataset is not a multi-speaker dataset!\")\n",
|
||||
"else:\n",
|
||||
" ft_command = \"\"\n",
|
||||
"if action== \"convert single-speaker to multi-speaker model\" or action == \"finetune\":\n",
|
||||
" try:\n",
|
||||
" with open('/content/piper/notebooks/pretrained_models.json') as f:\n",
|
||||
" pretrained_models = json.load(f)\n",
|
||||
" if final_language in pretrained_models:\n",
|
||||
" models = pretrained_models[final_language]\n",
|
||||
" model_options = [(model_name, model_name) for model_name, model_url in models.items()]\n",
|
||||
" model_dropdown = widgets.Dropdown(description = \"Choose pretrained model\", options=model_options)\n",
|
||||
" download_button = widgets.Button(description=\"Download\")\n",
|
||||
" def download_model(btn):\n",
|
||||
" model_name = model_dropdown.value\n",
|
||||
" model_url = pretrained_models[final_language][model_name]\n",
|
||||
" print(\"Downloading pretrained model...\")\n",
|
||||
" if model_url.startswith(\"1\"):\n",
|
||||
" !gdown -q \"{model_url}\" -O \"/content/pretrained.ckpt\"\n",
|
||||
" elif model_url.startswith(\"https://drive.google.com/file/d/\"):\n",
|
||||
" !gdown -q \"{model_url}\" -O \"/content/pretrained.ckpt\" --fuzzy\n",
|
||||
" else:\n",
|
||||
" !wget -q \"{model_url}\" -O \"/content/pretrained.ckpt\"\n",
|
||||
" model_dropdown.close()\n",
|
||||
" download_button.close()\n",
|
||||
" output.clear()\n",
|
||||
" if os.path.exists(\"/content/pretrained.ckpt\"):\n",
|
||||
" print(\"Model downloaded!\")\n",
|
||||
" else:\n",
|
||||
" raise Exception(\"Couldn't download the pretrained model!\")\n",
|
||||
" download_button.on_click(download_model)\n",
|
||||
" display(model_dropdown, download_button)\n",
|
||||
" else:\n",
|
||||
" raise Exception(f\"There are no pretrained models available for the language {final_language}\")\n",
|
||||
" except FileNotFoundError:\n",
|
||||
" raise Exception(\"The pretrained_models.json file was not found.\")\n",
|
||||
"else:\n",
|
||||
" print(\"Warning: this model will be trained from scratch. You need at least 8 hours of data for everything to work decent. Good luck!\")\n",
|
||||
"#@markdown ### Elige el tamaño del lote basándose en este conjunto de datos:\n",
|
||||
"batch_size = 12 #@param {type:\"integer\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"validation_split = 0.01\n",
|
||||
"#@markdown ### Elige la calidad para este modelo:\n",
|
||||
"\n",
|
||||
"#@markdown * x-low - 16Khz audio, 5-7M params\n",
|
||||
"#@markdown * medium - 22.05Khz audio, 15-20 params\n",
|
||||
"#@markdown * high - 22.05Khz audio, 28-32M params\n",
|
||||
"quality = \"medium\" #@param [\"high\", \"x-low\", \"medium\"]\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### Elige la calidad para este modelo: ¿Cada cuántas épocas quieres autoguardar los puntos de control de entrenamiento?\n",
|
||||
"#@markdown Cuanto mayor sea tu conjunto de datos, debes establecer este intervalo de guardado en un valor menor, ya que las épocas pueden progresar durante más tiempo.\n",
|
||||
"checkpoint_epochs = 5 #@param {type:\"integer\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### Intervalo de pasos para generar muestras de audio del modelo:\n",
|
||||
"log_every_n_steps = 1000 #@param {type:\"integer\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### Número de épocas para el entrenamiento.\n",
|
||||
"max_epochs = 10000 #@param {type:\"integer\"}\n",
|
||||
"#@markdown ---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"colab": {
|
||||
"background_save": true
|
||||
},
|
||||
"id": "X4zbSjXg2J3N"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **5. Entrenar.** 🏋️♂️\n",
|
||||
"#@markdown Ejecuta esta celda para entrenar tu modelo. Si es posible, se guardarán algunas muestras de audio durante el entrenamiento en la carpeta de salida.\n",
|
||||
"\n",
|
||||
"get_ipython().system(f'''\n",
|
||||
"python -m piper_train \\\n",
|
||||
"--dataset-dir \"{output_dir}\" \\\n",
|
||||
"--accelerator 'gpu' \\\n",
|
||||
"--devices 1 \\\n",
|
||||
"--batch-size {batch_size} \\\n",
|
||||
"--validation-split {validation_split} \\\n",
|
||||
"--num-test-examples 2 \\\n",
|
||||
"--quality {quality} \\\n",
|
||||
"--checkpoint-epochs {checkpoint_epochs} \\\n",
|
||||
"--log_every_n_steps {log_every_n_steps} \\\n",
|
||||
"--max_epochs {max_epochs} \\\n",
|
||||
"{ft_command}\\\n",
|
||||
"--precision 32\n",
|
||||
"''')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "6ISG085SYn85"
|
||||
},
|
||||
"source": [
|
||||
"# ¿Has terminado el entrenamiento y quieres probar el modelo?\n",
|
||||
"\n",
|
||||
"* ¡Si quieres ejecutar este modelo en cualquier software que Piper integre o en la misma app de Piper, exporta tu modelo usando el [cuaderno exportador de modelos](https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_model_exporter.ipynb)!\n",
|
||||
"* Si quieres probar este modelo ahora mismo antes de exportarlo al formato soportado por Piper. ¡Prueba tu last.ckpt generado con [este cuaderno](https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_inference_(ckpt).ipynb)!"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"accelerator": "GPU",
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"include_colab_link": true
|
||||
},
|
||||
"gpuClass": "standard",
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
@@ -0,0 +1,465 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "view-in-github",
|
||||
"colab_type": "text"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_multilingual_training_notebook.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "eK3nmYDB6C1a"
|
||||
},
|
||||
"source": [
|
||||
"# <font color=\"pink\"> **[Piper](https://github.com/rhasspy/piper) training notebook.**\n",
|
||||
"## \n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"- Notebook made by [rmcpantoja](http://github.com/rmcpantoja)\n",
|
||||
"- Collaborator: [Xx_Nessu_xX](https://fakeyou.com/profile/Xx_Nessu_xX)\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"# Notes:\n",
|
||||
"\n",
|
||||
"- <font color=\"orange\">**Things in orange mean that they are important.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "AICh6p5OJybj"
|
||||
},
|
||||
"source": [
|
||||
"# <font color=\"pink\">🔧 ***First steps.*** 🔧"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "qyxSMuzjfQrz"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown ## <font color=\"pink\"> **Google Colab Anti-Disconnect.** 🔌\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown #### Avoid automatic disconnection. Still, it will disconnect after <font color=\"orange\">**6 to 12 hours**</font>.\n",
|
||||
"\n",
|
||||
"import IPython\n",
|
||||
"js_code = '''\n",
|
||||
"function ClickConnect(){\n",
|
||||
"console.log(\"Working\");\n",
|
||||
"document.querySelector(\"colab-toolbar-button#connect\").click()\n",
|
||||
"}\n",
|
||||
"setInterval(ClickConnect,60000)\n",
|
||||
"'''\n",
|
||||
"display(IPython.display.Javascript(js_code))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "ygxzp-xHTC7T"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown ## <font color=\"pink\"> **Check GPU type.** 👁️\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown #### A higher capable GPU can lead to faster training speeds. By default, you will have a <font color=\"orange\">**Tesla T4**</font>.\n",
|
||||
"!nvidia-smi"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "sUNjId07JfAK"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **Mount Google Drive.** 📂\n",
|
||||
"from google.colab import drive\n",
|
||||
"drive.mount('/content/drive', force_remount=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "_XwmTVlcUgCh"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **Install software.** 📦\n",
|
||||
"\n",
|
||||
"#@markdown ####In this cell the synthesizer and its necessary dependencies to execute the training will be installed. (this may take a while)\n",
|
||||
"\n",
|
||||
"#@markdown #### <font color=\"orange\">**Do you want to use the patch?**\n",
|
||||
"#@markdown The patch provides the ability to export audio files to the output folder and save a single model while training.\n",
|
||||
"usepatch = True #@param {type:\"boolean\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"# clone:\n",
|
||||
"!git clone -q https://github.com/rmcpantoja/piper\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"!wget -q \"https://raw.githubusercontent.com/coqui-ai/TTS/dev/TTS/bin/resample.py\"\n",
|
||||
"#!pip install -q -r requirements.txt\n",
|
||||
"!pip install -q cython>=0.29.0 piper-phonemize==1.1.0 librosa>=0.9.2 numpy>=1.19.0 onnxruntime>=1.11.0 pytorch-lightning==1.7.0 torch==1.11.0\n",
|
||||
"!pip install -q torchtext==0.12.0 torchvision==0.12.0\n",
|
||||
"# fixing recent compativility isswes:\n",
|
||||
"!pip install -q torchaudio==0.11.0 torchmetrics==0.11.4\n",
|
||||
"!bash build_monotonic_align.sh\n",
|
||||
"!apt-get install -q espeak-ng\n",
|
||||
"# download patches:\n",
|
||||
"if usepatch:\n",
|
||||
" print(\"Downloading patch...\")\n",
|
||||
" !gdown -q \"1EWEb7amo1rgFGpBFfRD4BKX3pkjVK1I-\" -O \"/content/piper/src/python/patch.zip\"\n",
|
||||
" !unzip -o -q \"patch.zip\"\n",
|
||||
"%cd /content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "A3bMzEE0V5Ma"
|
||||
},
|
||||
"source": [
|
||||
"# <font color=\"pink\"> 🤖 ***Training.*** 🤖"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "SvEGjf0aV8eg"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **1. Extract dataset.** 📥\n",
|
||||
"#@markdown ####Important: the audios must be in <font color=\"orange\">**wav format, (16000 or 22050hz, 16-bits, mono), and, for convenience, numbered. Example:**\n",
|
||||
"\n",
|
||||
"#@markdown * <font color=\"orange\">**1.wav**</font>\n",
|
||||
"#@markdown * <font color=\"orange\">**2.wav**</font>\n",
|
||||
"#@markdown * <font color=\"orange\">**3.wav**</font>\n",
|
||||
"#@markdown * <font color=\"orange\">**.....**</font>\n",
|
||||
"\n",
|
||||
"#@markdown ---\n",
|
||||
"\n",
|
||||
"%cd /content\n",
|
||||
"!mkdir /content/dataset\n",
|
||||
"%cd /content/dataset\n",
|
||||
"!mkdir /content/dataset/wavs\n",
|
||||
"#@markdown ### Audio dataset path to unzip:\n",
|
||||
"zip_path = \"/content/drive/MyDrive/Wavs.zip\" #@param {type:\"string\"}\n",
|
||||
"!unzip \"{zip_path}\" -d /content/dataset/wavs\n",
|
||||
"#@markdown ---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "E0W0OCvXXvue"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **2. Upload the transcript file.** 📝\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ####<font color=\"orange\">**Important: the transcription means writing what the character says in each of the audios, and it must have the following structure:**\n",
|
||||
"\n",
|
||||
"#@markdown ##### <font color=\"orange\">For a single-speaker dataset:\n",
|
||||
"#@markdown * wavs/1.wav|This is what my character says in audio 1.\n",
|
||||
"#@markdown * wavs/2.wav|This, the text that the character says in audio 2.\n",
|
||||
"#@markdown * ...\n",
|
||||
"\n",
|
||||
"#@markdown ##### <font color=\"orange\">For a multi-speaker dataset:\n",
|
||||
"\n",
|
||||
"#@markdown * wavs/speaker1audio1.wav|speaker1|This is what the first speaker says.\n",
|
||||
"#@markdown * wavs/speaker1audio2.wav|speaker1|This is another audio of the first speaker.\n",
|
||||
"#@markdown * wavs/speaker2audio1.wav|speaker2|This is what the second speaker says in the first audio.\n",
|
||||
"#@markdown * wavs/speaker2audio2.wav|speaker2|This is another audio of the second speaker.\n",
|
||||
"#@markdown * ...\n",
|
||||
"\n",
|
||||
"#@markdown And so on. In addition, the transcript must be in a <font color=\"orange\">**.csv format. (UTF-8 without BOM)**\n",
|
||||
"\n",
|
||||
"#@markdown ---\n",
|
||||
"%cd /content/dataset\n",
|
||||
"from google.colab import files\n",
|
||||
"!rm /content/dataset/metadata.csv\n",
|
||||
"listfn, length = files.upload().popitem()\n",
|
||||
"if listfn != \"metadata.csv\":\n",
|
||||
" !mv \"$listfn\" metadata.csv\n",
|
||||
"%cd .."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "dOyx9Y6JYvRF"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **3. Preprocess dataset.** 🔄\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"#@markdown ### First of all, select the language of your dataset.\n",
|
||||
"language = \"English (U.S.)\" #@param [\"Català\", \"Dansk\", \"Deutsch\", \"Ελληνικά\", \"English (British)\", \"English (U.S.)\", \"Español\", \"Español (latinoamericano)\", \"Suomi\", \"Français\", \"Magyar\", \"Icelandic\", \"Italiano\", \"ქართული\", \"қазақша\", \"Lëtzebuergesch\", \"नेपाली\", \"Nederlands\", \"Norsk\", \"Polski\", \"Português (Brasil)\", \"Română\", \"Русский\", \"Српски\", \"Svenska\", \"Kiswahili\", \"Türkçe\", \"украї́нська\", \"Tiếng Việt\", \"简体中文\"]\n",
|
||||
"#@markdown ---\n",
|
||||
"# language definition:\n",
|
||||
"languages = {\n",
|
||||
" \"Català\": \"ca\",\n",
|
||||
" \"Dansk\": \"da\",\n",
|
||||
" \"Deutsch\": \"de\",\n",
|
||||
" \"Ελληνικά\": \"grc\",\n",
|
||||
" \"English (British)\": \"en\",\n",
|
||||
" \"English (U.S.)\": \"en-us\",\n",
|
||||
" \"Español\": \"es\",\n",
|
||||
" \"Español (latinoamericano)\": \"es-419\",\n",
|
||||
" \"Suomi\": \"fi\",\n",
|
||||
" \"Français\": \"fr\",\n",
|
||||
" \"Magyar\": \"hu\",\n",
|
||||
" \"Icelandic\": \"is\",\n",
|
||||
" \"Italiano\": \"it\",\n",
|
||||
" \"ქართული\": \"ka\",\n",
|
||||
" \"қазақша\": \"kk\",\n",
|
||||
" \"Lëtzebuergesch\": \"lb\",\n",
|
||||
" \"नेपाली\": \"ne\",\n",
|
||||
" \"Nederlands\": \"nl\",\n",
|
||||
" \"Norsk\": \"nb\",\n",
|
||||
" \"Polski\": \"pl\",\n",
|
||||
" \"Português (Brasil)\": \"pt-br\",\n",
|
||||
" \"Română\": \"ro\",\n",
|
||||
" \"Русский\": \"ru\",\n",
|
||||
" \"Српски\": \"sr\",\n",
|
||||
" \"Svenska\": \"sv\",\n",
|
||||
" \"Kiswahili\": \"sw\",\n",
|
||||
" \"Türkçe\": \"tr\",\n",
|
||||
" \"украї́нська\": \"uk\",\n",
|
||||
" \"Tiếng Việt\": \"vi\",\n",
|
||||
" \"简体中文\": \"zh\"\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"def _get_language(code):\n",
|
||||
" return languages[code]\n",
|
||||
"\n",
|
||||
"final_language = _get_language(language)\n",
|
||||
"#@markdown ### Choose a name for your model:\n",
|
||||
"model_name = \"Test\" #@param {type:\"string\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"# output:\n",
|
||||
"#@markdown ### Choose the working folder: (recommended to save to Drive)\n",
|
||||
"\n",
|
||||
"#@markdown The working folder will be used in preprocessing, but also in training the model.\n",
|
||||
"output_path = \"/content/drive/MyDrive/colab/piper\" #@param {type:\"string\"}\n",
|
||||
"output_dir = output_path+\"/\"+model_name\n",
|
||||
"if not os.path.exists(output_dir):\n",
|
||||
" os.makedirs(output_dir)\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### Choose dataset format:\n",
|
||||
"dataset_format = \"ljspeech\" #@param [\"ljspeech\", \"mycroft\"]\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### Is this a single speaker dataset? Otherwise, uncheck:\n",
|
||||
"single_speaker = True #@param {type:\"boolean\"}\n",
|
||||
"if single_speaker:\n",
|
||||
" force_sp = \" --single-speaker\"\n",
|
||||
"else:\n",
|
||||
" force_sp = \"\"\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### Select the sample rate of the dataset:\n",
|
||||
"sample_rate = \"22050\" #@param [\"16000\", \"22050\"]\n",
|
||||
"#@markdown ---\n",
|
||||
"%cd /content/piper/src/python\n",
|
||||
"#@markdown ### Do you want to train using this sample rate, but your audios don't have it?\n",
|
||||
"#@markdown The resampler helps you do it quickly!\n",
|
||||
"resample = False #@param {type:\"boolean\"}\n",
|
||||
"if resample:\n",
|
||||
" !python resample.py --input_dir \"/content/dataset/wavs\" --output_dir \"/content/dataset/wavs_resampled\" --output_sr {sample_rate} --file_ext \"wav\"\n",
|
||||
" !mv /content/dataset/wavs_resampled/* /content/dataset/wavs\n",
|
||||
"#@markdown ---\n",
|
||||
"\n",
|
||||
"!python -m piper_train.preprocess \\\n",
|
||||
" --language {final_language} \\\n",
|
||||
" --input-dir /content/dataset \\\n",
|
||||
" --output-dir \"{output_dir}\" \\\n",
|
||||
" --dataset-format {dataset_format} \\\n",
|
||||
" --sample-rate {sample_rate} \\\n",
|
||||
" {force_sp}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"id": "ickQlOCRjkBL"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **4. Settings.** 🧰\n",
|
||||
"import json\n",
|
||||
"import ipywidgets as widgets\n",
|
||||
"from IPython.display import display\n",
|
||||
"from google.colab import output\n",
|
||||
"import os\n",
|
||||
"#@markdown ### Select the action to train this dataset:\n",
|
||||
"\n",
|
||||
"#@markdown * The option to continue a training is self-explanatory. If you've previously trained a model with free colab, your time is up and you're considering training it some more, this is ideal for you. You just have to set the same settings that you set when you first trained this model.\n",
|
||||
"#@markdown * The option to convert a single-speaker model to a multi-speaker model is self-explanatory, and for this it is important that you have processed a dataset that contains text and audio from all possible speakers that you want to train in your model.\n",
|
||||
"#@markdown * The finetune option is used to train a dataset using a pretrained model, that is, train on that data. This option is ideal if you want to train a very small dataset (more than five minutes recommended).\n",
|
||||
"#@markdown * The train from scratch option builds features such as dictionary and speech form from scratch, and this may take longer to converge. For this, hours of audio (8 at least) are recommended, which have a large collection of phonemes.\n",
|
||||
"\n",
|
||||
"action = \"finetune\" #@param [\"Continue training\", \"convert single-speaker to multi-speaker model\", \"finetune\", \"train from scratch\"]\n",
|
||||
"#@markdown ---\n",
|
||||
"if action == \"Continue training\":\n",
|
||||
" if os.path.exists(f\"{output_dir}/lightning_logs/version_0/checkpoints/last.ckpt\"):\n",
|
||||
" ft_command = f'--resume_from_checkpoint \"{output_dir}/lightning_logs/version_0/checkpoints/last.ckpt\" '\n",
|
||||
" print(f\"Continuing {model_name}'s training at: {output_dir}/lightning_logs/version_0/checkpoints/last.ckpt\")\n",
|
||||
" else:\n",
|
||||
" raise Exception(\"Training cannot be continued as there is no checkpoint to continue at.\")\n",
|
||||
"elif action == \"finetune\":\n",
|
||||
" if os.path.exists(f\"{output_dir}/lightning_logs/version_0/checkpoints/last.ckpt\"):\n",
|
||||
" raise Exception(\"Oh no! You have already trained this model before, you cannot choose this option since your progress will be lost, and then your previous time will not count. Please select the option to continue a training.\")\n",
|
||||
" else:\n",
|
||||
" ft_command = '--resume_from_checkpoint \"/content/pretrained.ckpt\" '\n",
|
||||
"elif action == \"convert single-speaker to multi-speaker model\":\n",
|
||||
" if not single_speaker:\n",
|
||||
" ft_command = '--resume_from_single_speaker_checkpoint \"/content/pretrained.ckpt\" '\n",
|
||||
" else:\n",
|
||||
" raise Exception(\"This dataset is not a multi-speaker dataset!\")\n",
|
||||
"else:\n",
|
||||
" ft_command = \"\"\n",
|
||||
"if action== \"convert single-speaker to multi-speaker model\" or action == \"finetune\":\n",
|
||||
" try:\n",
|
||||
" with open('/content/piper/notebooks/pretrained_models.json') as f:\n",
|
||||
" pretrained_models = json.load(f)\n",
|
||||
" if final_language in pretrained_models:\n",
|
||||
" models = pretrained_models[final_language]\n",
|
||||
" model_options = [(model_name, model_name) for model_name, model_url in models.items()]\n",
|
||||
" model_dropdown = widgets.Dropdown(description = \"Choose pretrained model\", options=model_options)\n",
|
||||
" download_button = widgets.Button(description=\"Download\")\n",
|
||||
" def download_model(btn):\n",
|
||||
" model_name = model_dropdown.value\n",
|
||||
" model_url = pretrained_models[final_language][model_name]\n",
|
||||
" print(\"Downloading pretrained model...\")\n",
|
||||
" if model_url.startswith(\"1\"):\n",
|
||||
" !gdown -q \"{model_url}\" -O \"/content/pretrained.ckpt\"\n",
|
||||
" elif model_url.startswith(\"https://drive.google.com/file/d/\"):\n",
|
||||
" !gdown -q \"{model_url}\" -O \"/content/pretrained.ckpt\" --fuzzy\n",
|
||||
" else:\n",
|
||||
" !wget -q \"{model_url}\" -O \"/content/pretrained.ckpt\"\n",
|
||||
" model_dropdown.close()\n",
|
||||
" download_button.close()\n",
|
||||
" output.clear()\n",
|
||||
" if os.path.exists(\"/content/pretrained.ckpt\"):\n",
|
||||
" print(\"Model downloaded!\")\n",
|
||||
" else:\n",
|
||||
" raise Exception(\"Couldn't download the pretrained model!\")\n",
|
||||
" download_button.on_click(download_model)\n",
|
||||
" display(model_dropdown, download_button)\n",
|
||||
" else:\n",
|
||||
" raise Exception(f\"There are no pretrained models available for the language {final_language}\")\n",
|
||||
" except FileNotFoundError:\n",
|
||||
" raise Exception(\"The pretrained_models.json file was not found.\")\n",
|
||||
"else:\n",
|
||||
" print(\"Warning: this model will be trained from scratch. You need at least 8 hours of data for everything to work decent. Good luck!\")\n",
|
||||
"#@markdown ### Choose batch size based on this dataset:\n",
|
||||
"batch_size = 12 #@param {type:\"integer\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"validation_split = 0.01\n",
|
||||
"#@markdown ### Choose the quality for this model:\n",
|
||||
"\n",
|
||||
"#@markdown * x-low - 16Khz audio, 5-7M params\n",
|
||||
"#@markdown * medium - 22.05Khz audio, 15-20 params\n",
|
||||
"#@markdown * high - 22.05Khz audio, 28-32M params\n",
|
||||
"quality = \"medium\" #@param [\"high\", \"x-low\", \"medium\"]\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### For how many epochs to save training checkpoints?\n",
|
||||
"#@markdown The larger your dataset, you should set this saving interval to a smaller value, as epochs can progress longer time.\n",
|
||||
"checkpoint_epochs = 5 #@param {type:\"integer\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### Step interval to generate model samples:\n",
|
||||
"log_every_n_steps = 1000 #@param {type:\"integer\"}\n",
|
||||
"#@markdown ---\n",
|
||||
"#@markdown ### Training epochs:\n",
|
||||
"max_epochs = 10000 #@param {type:\"integer\"}\n",
|
||||
"#@markdown ---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"cellView": "form",
|
||||
"colab": {
|
||||
"background_save": true
|
||||
},
|
||||
"id": "X4zbSjXg2J3N"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@markdown # <font color=\"pink\"> **5. Train.** 🏋️♂️\n",
|
||||
"#@markdown Run this cell to train your final model! If possible, some audio samples will be saved during training in the output folder.\n",
|
||||
"\n",
|
||||
"get_ipython().system(f'''\n",
|
||||
"python -m piper_train \\\n",
|
||||
"--dataset-dir \"{output_dir}\" \\\n",
|
||||
"--accelerator 'gpu' \\\n",
|
||||
"--devices 1 \\\n",
|
||||
"--batch-size {batch_size} \\\n",
|
||||
"--validation-split {validation_split} \\\n",
|
||||
"--num-test-examples 2 \\\n",
|
||||
"--quality {quality} \\\n",
|
||||
"--checkpoint-epochs {checkpoint_epochs} \\\n",
|
||||
"--log_every_n_steps {log_every_n_steps} \\\n",
|
||||
"--max_epochs {max_epochs} \\\n",
|
||||
"{ft_command}\\\n",
|
||||
"--precision 32\n",
|
||||
"''')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "6ISG085SYn85"
|
||||
},
|
||||
"source": [
|
||||
"# Have you finished training and want to test the model?\n",
|
||||
"\n",
|
||||
"* If you want to run this model in any software that Piper integrates or the same Piper app, export your model using the [model exporter notebook](https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_model_exporter.ipynb)!\n",
|
||||
"* Wait! I want to test this right now before exporting it to the supported format for Piper. Test your generated last.ckpt with [this notebook](https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_inference_(ckpt).ipynb)!"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"accelerator": "GPU",
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"include_colab_link": true
|
||||
},
|
||||
"gpuClass": "standard",
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
106
ascend_910-piper/piper/notebooks/pretrained_models.json
Normal file
106
ascend_910-piper/piper/notebooks/pretrained_models.json
Normal file
@@ -0,0 +1,106 @@
|
||||
{
|
||||
"ar": {
|
||||
"qasr-low": "1H9y8nlJ3K6_elXsB6YaJKsnbEBYCSF-_",
|
||||
"qasr-high": "10xcE_l1DMQorjnQoRcUF7KP2uRgSr11q"
|
||||
},
|
||||
"ca": {
|
||||
"upc_ona-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/ca/ca_ES/upc_ona/medium/epoch%3D3184-step%3D1641140.ckpt"
|
||||
},
|
||||
"da": {
|
||||
"talesyntese-medium": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/da/da_DK/talesyntese/medium/epoch%3D3264-step%3D1634940.ckpt"
|
||||
},
|
||||
"de": {
|
||||
"thorsten-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/de/de_DE/thorsten/medium/epoch%3D3135-step%3D2702056.ckpt",
|
||||
"thorsten_emotional (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/de/de_DE/thorsten_emotional/medium/epoch%3D6069-step%3D230660.ckpt"
|
||||
},
|
||||
"en-gb": {
|
||||
"alan-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_GB/alan/medium/epoch%3D6339-step%3D1647790.ckpt",
|
||||
"alba-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_GB/alba/medium/epoch%3D4179-step%3D2101090.ckpt",
|
||||
"aru-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_GB/aru/medium/epoch%3D3479-step%3D939600.ckpt",
|
||||
"jenny_dioco-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_GB/jenny_dioco/medium/epoch%3D2748-step%3D1729300.ckpt",
|
||||
"northern_english_male-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_GB/northern_english_male/medium/epoch%3D9029-step%3D2261720.ckpt",
|
||||
"semaine-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_GB/semaine/medium/epoch%3D1849-step%3D214600.ckpt",
|
||||
"vctk-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_GB/vctk/medium/epoch%3D545-step%3D1511328.ckpt"
|
||||
},
|
||||
"en-us": {
|
||||
"amy_medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_US/amy/medium/epoch%3D6679-step%3D1554200.ckpt",
|
||||
"arctic_medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_US/arctic/medium/epoch%3D663-step%3D646736.ckpt",
|
||||
"joe_medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_US/joe/medium/epoch%3D7889-step%3D1221224.ckpt",
|
||||
"kusal_medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_US/kusal/medium/epoch%3D2652-step%3D1953828.ckpt",
|
||||
"l2arctic_medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_US/l2arctic/medium/epoch%3D536-step%3D902160.ckpt",
|
||||
"lessac-high": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_US/lessac/high/epoch%3D2218-step%3D838782.ckpt",
|
||||
"lessac-low": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_US/lessac/low/epoch%3D2307-step%3D558536.ckpt",
|
||||
"lessac-medium": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_US/lessac/medium/epoch%3D2164-step%3D1355540.ckpt",
|
||||
"Ryan-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/en/en_US/ryan/medium/epoch%3D4641-step%3D3104302.ckpt"
|
||||
},
|
||||
"es": {
|
||||
"davefx-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/es/es_ES/davefx/medium/epoch%3D5629-step%3D1605020.ckpt",
|
||||
"sharvard-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/es/es_ES/sharvard/medium/epoch%3D4899-step%3D215600.ckpt"
|
||||
},
|
||||
"es-419": {
|
||||
"aldo-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/es/es_MX/ald/medium/epoch%3D9999-step%3D1753600.ckpt"
|
||||
},
|
||||
"fi": {
|
||||
"harri-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/fi/fi_FI/harri/medium/epoch%3D3369-step%3D1714630.ckpt"
|
||||
},
|
||||
"fr": {
|
||||
"siwis-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/fr/fr_FR/siwis/medium/epoch%3D3304-step%3D2050940.ckpt",
|
||||
"upmc-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/fr/fr_FR/upmc/medium/epoch%3D2999-step%3D702000.ckpt"
|
||||
},
|
||||
"hu": {
|
||||
"berta-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/blob/main/hu/hu_HU/berta/epoch%3D5249-step%3D1429580.ckpt"
|
||||
},
|
||||
"ka": {
|
||||
"natia-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/ka/ka_GE/natia/medium/epoch%3D5239-step%3D1607690.ckpt"
|
||||
},
|
||||
"kk": {
|
||||
"iseke-low": "1kIcnqTr6DI4JRibooe7ZvCIkGHez8kQT",
|
||||
"raya-low": "11UuZBPqjgn09S4Vkv7yi7_rIp7yB0UCt"
|
||||
},
|
||||
"lb": {
|
||||
"marylux-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/lb/lb_LU/marylux/medium/epoch%3D4419-step%3D1558490.ckpt"
|
||||
},
|
||||
"ne": {
|
||||
"Google-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/ne/ne_NP/google/medium/epoch%3D2829-step%3D367900.ckpt"
|
||||
},
|
||||
"nl": {
|
||||
"nathalie-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/nl/nl_NL/nathalie/medium/epoch%3D6119-step%3D1806410.ckpt"
|
||||
},
|
||||
"no": {
|
||||
"talesyntese-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/no/no_NO/talesyntese/medium/epoch%3D3459-step%3D2052250.ckpt"
|
||||
},
|
||||
"pl": {
|
||||
"darkman-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/pl/pl_PL/darkman/medium/epoch%3D4909-step%3D1454360.ckpt",
|
||||
"gosia-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/pl/pl_PL/gosia/medium/epoch%3D5001-step%3D1457672.ckpt"
|
||||
},
|
||||
"pt": {
|
||||
"faber-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/pt/pt_BR/faber/medium/epoch%3D6159-step%3D1230728.ckpt"
|
||||
},
|
||||
"ro": {
|
||||
"mihai-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/ro/ro_RO/mihai/medium/epoch%3D7809-step%3D1558760.ckpt"
|
||||
},
|
||||
"ru": {
|
||||
"denis-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/ru/ru_RU/denis/medium/epoch%3D4474-step%3D1521860.ckpt",
|
||||
"dmitri-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/ru/ru_RU/dmitri/medium/epoch%3D5589-step%3D1478840.ckpt",
|
||||
"irina-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/ru/ru_RU/irina/medium/epoch%3D4139-step%3D929464.ckpt",
|
||||
"ruslan-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/ru/ru_RU/ruslan/medium/epoch%3D2436-step%3D1724372.ckpt"
|
||||
},
|
||||
"sr": {
|
||||
"serbski_institut-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/sr/sr_RS/serbski_institut/medium/epoch%3D1899-step%3D178600.ckpt"
|
||||
},
|
||||
"sw": {
|
||||
"lanfrica-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/sw/sw_CD/lanfrica/medium/epoch%3D2619-step%3D1635820.ckpt"
|
||||
},
|
||||
"tr": {
|
||||
"dfki-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/tr/tr_TR/dfki/medium/epoch%3D5679-step%3D1489110.ckpt"
|
||||
},
|
||||
"uk": {
|
||||
"ukrainian_tts-medium": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/uk/uk_UK/ukrainian_tts/medium/epoch%3D2090-step%3D1166778.ckpt"
|
||||
},
|
||||
"vi": {
|
||||
"vais1000-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/vi/vi_VN/vais1000/medium/epoch%3D4769-step%3D919580.ckpt"
|
||||
},
|
||||
"zh": {
|
||||
"huayan-medium (fine-tuned)": "https://huggingface.co/datasets/rhasspy/piper-checkpoints/resolve/main/zh/zh_CN/huayan/medium/epoch%3D3269-step%3D2460540.ckpt"
|
||||
}
|
||||
}
|
||||
27
ascend_910-piper/piper/notebooks/translator.py
Normal file
27
ascend_910-piper/piper/notebooks/translator.py
Normal file
@@ -0,0 +1,27 @@
|
||||
import configparser
|
||||
import os
|
||||
|
||||
class Translator:
|
||||
def __init__(self):
|
||||
self.configs = {}
|
||||
|
||||
def load_language(self, language_name):
|
||||
if language_name not in self.configs:
|
||||
config = configparser.ConfigParser()
|
||||
config.read(os.path.join(os.getcwd(), "lng", f"{language_name}.lang"))
|
||||
self.configs[language_name] = config
|
||||
|
||||
def translate(self, language_name, string):
|
||||
if language_name == "en":
|
||||
return string
|
||||
elif language_name not in self.configs:
|
||||
self.load_language(language_name)
|
||||
config = self.configs[language_name]
|
||||
try:
|
||||
return config.get("Strings", string)
|
||||
except (configparser.NoOptionError, configparser.NoSectionError):
|
||||
if string:
|
||||
return string
|
||||
else:
|
||||
raise Exception("language engine error: This translation is corrupt!")
|
||||
return 0
|
||||
BIN
ascend_910-piper/piper/notebooks/wav/en/downloaded.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/downloaded.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/downloading.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/downloading.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/dwnerror.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/dwnerror.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/exit.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/exit.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/gpuavailable.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/gpuavailable.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/installed.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/installed.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/installing.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/installing.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/loaded.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/loaded.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/multispeaker.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/multispeaker.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/nogpu.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/nogpu.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/noid.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/noid.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/nomodel.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/nomodel.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/novoices.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/novoices.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/selectmodel.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/selectmodel.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/starting.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/starting.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/success.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/success.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/en/waiting.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/en/waiting.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/downloaded.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/downloaded.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/downloading.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/downloading.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/dwnerror.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/dwnerror.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/exit.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/exit.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/gpuavailable.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/gpuavailable.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/installed.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/installed.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/installing.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/installing.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/loaded.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/loaded.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/multispeaker.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/multispeaker.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/nogpu.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/nogpu.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/noid.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/noid.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/nomodel.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/nomodel.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/novoices.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/novoices.wav
Normal file
Binary file not shown.
BIN
ascend_910-piper/piper/notebooks/wav/es/selectmodel.wav
Normal file
BIN
ascend_910-piper/piper/notebooks/wav/es/selectmodel.wav
Normal file
Binary file not shown.
Reference in New Issue
Block a user