update Readme

2025-08-29 15:31:01 +08:00 · 2025-08-29 13:38:28 +08:00 · 2025-08-29 11:27:43 +08:00 · 2025-08-28 10:28:10 +08:00 · 2025-08-22 18:54:44 +08:00 · 2025-08-22 18:10:11 +08:00
151 changed files with 88684 additions and 0 deletions
--- a/.DS_Store
+++ b/.DS_Store
--- a/026_0010.jpg
+++ b/026_0010.jpg
--- a/12
+++ b/12
@@ -0,0 +1,12 @@
+
+FROM git.modelhub.org.cn:9443/enginex-iluvatar/bi100-3.2.1-x86-ubuntu20.04-py3.10-poc-llm-infer:v1.2.2
+
+WORKDIR /workspace/
+COPY ./model_test_caltech_http.py /workspace/
+COPY ./microsoft_beit_base_patch16_224_pt22k_ft22k /model
+
+RUN ln -s $(which python3) /usr/bin/python
+
+CMD ["python3", "model_test_caltech_http.py"]
+
+
--- a/README.md
+++ b/README.md
@@ -1,2 +1,67 @@
 # image-classification-transformers

+## 天数智芯 天垓100 视觉分类
+transformers框架支持多种图像分类模型，现对天垓100加速卡进行transformers框架的适配并且带入到信创算力测试框架中。将视觉分类模型放在天数卡（天垓100）上运行且测试性能，注意该测试框架下的模型需适配transformers库。
+
+## Quick Start
+1、首先从 modelscope上下载视觉分类的模型，例如 microsoft/beit-base-patch16-224
+```python
+modelscope download --model microsoft/beit-base-patch16-224 README.md --local_dir /mnt/contest_ceph/zhoushasha/models/microsoft/beit_base_patch16_224_pt22k_ft22k
+```
+2、使用Dockerfile生成镜像
+从仓库的【软件包】栏目下载基础镜像 bi100-3.2.1-x86-ubuntu20.04-py3.10-poc-llm-infer:v1.2.2
+使用 Dockerfile_bi100 生成 镜像,例如 bi100-3.2.1-x86-ubuntu20.04-py3.10-poc-llm-infer:test
+注意 Dockerfile_bi100 中已预先将模型 microsoft_beit_base_patch16_224_pt22k_ft22k 放在了 /model 下面
+
+3、启动docker
+```python
+docker run -it --rm \
+  -p 10086:80 \
+  --name test_zss \
+  -v /mnt/contest_ceph/zhoushasha/models/image_models/microsoft_beit_base_patch16_224_pt22k_ft22k:/model:rw \
+  --privileged bi100-3.2.1-x86-ubuntu20.04-py3.10-poc-llm-infer:test
+```
+其中/mnt/contest_ceph/zhoushasha/models/image_models/microsoft_beit_base_patch16_224_pt22k_ft22k为你存放的模型文件的实际地址
+
+4、测试服务
+```python
+curl -X POST http://localhost:10086/v1/private/s782b4996 \
+>   -F "image=@/home/zhoushasha/models/026_0010.jpg"
+```
+
+## 视觉分类模型测试服务原理
+
+使用Hugging Face transformers库中的工具类AutoImageProcessor 和AutoModelForImageClassification
+
+AutoImageProcessor用于自动加载与预训练模型配套的图像处理器。与预训练模型绑定，通过from_pretrained(model_path)加载时，会自动读取模型训练时使用的预处理配置（如尺寸、归一化参数等），负责图像预处理（如尺寸调整、归一化等）。
+
+AutoModelForImageClassification是一个 “自动模型类”，会根据预训练模型的类型（如 ViT、ResNet 等）自动加载对应的网络结构。AutoModelForImageClassification.from_pretrained(model_path)从model_path加载预训练的图像分类模型，必须接收AutoImageProcessor处理后的张量作为输入。
+AutoModelForImageClassification，执行图像分类的核心计算，输入预处理后的张量，输出分类结果（如类别概率）。
+
+## 如何使用 视觉分类 模型测试框架
+
+代码实现了一个接收图像并返回概率最高的类别作为最终分类结果的视觉分类 HTTP 服务，并基于 zibo.harbor.iluvatar.com.cn:30000/saas/bi100-3.2.1-x86-ubuntu20.04-py3.10-poc-llm-infer:v1.2.2 基础镜像，将该 HTTP 服务重新打包成 docker 镜像，通过 k8s 集群sut容器去请求这个 HTTP 服务。
+
+该框架 已测试适配的 视觉分类 模型类型有：
+
+ 1、卷积神经网络（CNN）类：ResNet
+ 2、Transformer 类：ViT（Vision Transformer）、Swin Transformer、DeiT（Data-efficient Image Transformers）、BEiT（BERT Pre-training of Image Transformers）
+ 3、 轻量级模型：MobileNet 系列
+ 4、其他特殊设计：ConvNeXt
+
+
+
+## 天垓100视觉分类模型适配情况 
+| 模型地址 | 类型 | 适配状态 | 天垓100准确率 | 天垓100吞吐量（张/秒） | cpu准确率 | cpu吞吐量（4C）（张/秒） | Submit Id |
+| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
+| https://www.modelscope.cn/models/apple/mobilevit-x-small | MobileViT | 成功 | 22.6667% | 31.6415 | 22.6667% | 2.6574 | 249973 |
+| https://www.modelscope.cn/models/facebook/convnextv2-tiny-22k-384 | ConvNeXt V2（ConvNeXt 模型的改进版本） | 成功 | 29.3333% | 25.1330 | 29.3333% | 0.7301 | 249985 |
+| https://www.modelscope.cn/models/google/vit-base-patch16-224 | ViT（Vision Transformer） | 成功 | 29.3333% | 40.0226 | 29.3333% | 1.1306 | 249992 |
+| https://www.modelscope.cn/models/microsoft/beit-base-patch16-224-pt22k-ft22k | BEiT（BERT Pre-training of Image Transformers） | 成功 | 34.0000% | 23.7485 | 34.0000% | 0.9773 | 249537 |
+| https://www.modelscope.cn/models/microsoft/swinv2-tiny-patch4-window16-256 | Swin Transformer V2（基于Swin Transformer） | 成功 | 29.3333% | 13.8379 | 29.3333% | 1.0331 | 249557 |
+| https://www.modelscope.cn/models/facebook/deit-small-patch16-224 | DeiT（Data-efficient Image Transformer）由 Facebook AI 提出 | 成功 | 29.3333% | 40.5675 | 29.3333% | 3.2749 | 250034 |
+| https://www.modelscope.cn/models/microsoft/dit-base-finetuned-rvlcdip | DiT(Document Image Transformer) | 成功 | 0.0000% | 35.5122 | 0.0000% | 1.0823 | 250035 |
+| https://www.modelscope.cn/models/microsoft/cvt-13 | CvT(Convolutional Vision Transformer) | 成功 | 29.3333% | 27.1214 | 29.3333% | 1.7240 | 250039 |
+| https://www.modelscope.cn/models/google/efficientnet-b7 | EfficientNet 架构（基于卷积神经网络CNN） | 成功 | 28.6667% | 10.0449 | 28.6667% | 0.1541 | 250042 |
+| https://www.modelscope.cn/models/microsoft/resnet-18 | ResNet（Residual Network） | 成功 | 22.6667% | 43.5976 | 22.6667% | 7.3915 | 250047 |
+
--- a/config.yaml
+++ b/config.yaml
@@ -0,0 +1,6 @@
+leaderboard_options:
+  nfs:
+    - name: sid_model
+      srcRelativePath: zhoushasha/models/image_models/apple_mobilevit-small
+      mountPoint: /model
+      source: ceph_customer
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/.gitattributes
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/.gitattributes
@@ -0,0 +1,18 @@
+*.bin.* filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tar.gz filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/README.md
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/README.md
@@ -0,0 +1,104 @@
+---
+license: apache-2.0
+tags:
+- image-classification
+- vision
+datasets:
+- imagenet
+- imagenet-21k
+---
+
+# BEiT (base-sized model, fine-tuned on ImageNet-22k) 
+
+BEiT model pre-trained in a self-supervised fashion on ImageNet-22k - also called ImageNet-21k (14 million images, 21,841 classes) at resolution 224x224, and fine-tuned on the same dataset at resolution 224x224. It was introduced in the paper [BEIT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong and Furu Wei and first released in [this repository](https://github.com/microsoft/unilm/tree/master/beit). 
+
+Disclaimer: The team releasing BEiT did not write a model card for this model so this model card has been written by the Hugging Face team.
+
+## Model description
+
+The BEiT model is a Vision Transformer (ViT), which is a transformer encoder model (BERT-like). In contrast to the original ViT model, BEiT is pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. The pre-training objective for the model is to predict visual tokens from the encoder of OpenAI's DALL-E's VQ-VAE, based on masked patches.
+Next, the model was fine-tuned in a supervised fashion on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224.
+
+Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. Contrary to the original ViT models, BEiT models do use relative position embeddings (similar to T5) instead of absolute position embeddings, and perform classification of images by mean-pooling the final hidden states of the patches, instead of placing a linear layer on top of the final hidden state of the [CLS] token.
+
+By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. Alternatively, one can mean-pool the final hidden states of the patch embeddings, and place a linear layer on top of that.
+
+## Intended uses & limitations
+
+You can use the raw model for image classification. See the [model hub](https://huggingface.co/models?search=microsoft/beit) to look for
+fine-tuned versions on a task that interests you.
+
+### How to use
+
+Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:
+
+```python
+from transformers import BeitImageProcessor, BeitForImageClassification
+from PIL import Image
+import requests
+
+url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
+image = Image.open(requests.get(url, stream=True).raw)
+
+processor = BeitImageProcessor.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')
+model = BeitForImageClassification.from_pretrained('microsoft/beit-base-patch16-224-pt22k-ft22k')
+
+inputs = processor(images=image, return_tensors="pt")
+outputs = model(**inputs)
+logits = outputs.logits
+# model predicts one of the 21,841 ImageNet-22k classes
+predicted_class_idx = logits.argmax(-1).item()
+print("Predicted class:", model.config.id2label[predicted_class_idx])
+```
+
+Currently, both the feature extractor and model support PyTorch.
+
+## Training data
+
+The BEiT model was pretrained on [ImageNet-21k](http://www.image-net.org/), a dataset consisting of 14 million images and 21k classes, and fine-tuned on the same dataset.
+
+## Training procedure
+
+### Preprocessing
+
+The exact details of preprocessing of images during training/validation can be found [here](https://github.com/microsoft/unilm/blob/master/beit/datasets.py). 
+
+Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).
+
+### Pretraining
+
+For all pre-training related hyperparameters, we refer to page 15 of the [original paper](https://arxiv.org/abs/2106.08254).
+
+## Evaluation results
+
+For evaluation results on several image classification benchmarks, we refer to tables 1 and 2 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution. Of course, increasing the model size will result in better performance.
+
+### BibTeX entry and citation info
+
+```@article{DBLP:journals/corr/abs-2106-08254,
+  author    = {Hangbo Bao and
+               Li Dong and
+               Furu Wei},
+  title     = {BEiT: {BERT} Pre-Training of Image Transformers},
+  journal   = {CoRR},
+  volume    = {abs/2106.08254},
+  year      = {2021},
+  url       = {https://arxiv.org/abs/2106.08254},
+  archivePrefix = {arXiv},
+  eprint    = {2106.08254},
+  timestamp = {Tue, 29 Jun 2021 16:55:04 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-2106-08254.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```
+
+```bibtex
+@inproceedings{deng2009imagenet,
+  title={Imagenet: A large-scale hierarchical image database},
+  author={Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li},
+  booktitle={2009 IEEE conference on computer vision and pattern recognition},
+  pages={248--255},
+  year={2009},
+  organization={Ieee}
+}
+```
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/config.json
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/config.json
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/flax_model.msgpack
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/flax_model.msgpack
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/preprocessor_config.json
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/preprocessor_config.json
@@ -0,0 +1,19 @@
+{
+  "crop_size": 224,
+  "do_center_crop": false,
+  "do_normalize": true,
+  "do_resize": true,
+  "feature_extractor_type": "BeitFeatureExtractor",
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "resample": 2,
+  "size": 224
+}
--- a/microsoft_beit_base_patch16_224_pt22k_ft22k/pytorch_model.bin
+++ b/microsoft_beit_base_patch16_224_pt22k_ft22k/pytorch_model.bin
--- a/model_test_caltech_http.py
+++ b/model_test_caltech_http.py
@@ -0,0 +1,166 @@
+import torch
+import time
+import os
+import multiprocessing
+from PIL import Image
+from transformers import AutoImageProcessor, AutoModelForImageClassification
+from flask import Flask, request, jsonify
+from io import BytesIO
+
+# 设置CPU核心数为4
+os.environ["OMP_NUM_THREADS"] = "4"
+os.environ["MKL_NUM_THREADS"] = "4"
+os.environ["NUMEXPR_NUM_THREADS"] = "4"
+os.environ["OPENBLAS_NUM_THREADS"] = "4"
+os.environ["VECLIB_MAXIMUM_THREADS"] = "4"
+torch.set_num_threads(4)  # 设置PyTorch的CPU线程数
+
+# 设备配置
+device_cuda = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+device_cpu = torch.device("cpu")
+print(f"当前CUDA设备: {device_cuda}, CPU设备: {device_cpu}")
+print(f"CPU核心数设置: {torch.get_num_threads()}")
+
+class ImageClassifier:
+    def __init__(self, model_path: str):
+        self.processor = AutoImageProcessor.from_pretrained(model_path)
+        
+        # 分别加载GPU和CPU模型实例
+        if device_cuda.type == "cuda":
+            self.model_cuda = AutoModelForImageClassification.from_pretrained(model_path).to(device_cuda)
+        else:
+            self.model_cuda = None  # 若没有CUDA，则不加载
+        
+        self.model_cpu = AutoModelForImageClassification.from_pretrained(model_path).to(device_cpu)
+        
+        # 保存id2label映射
+        self.id2label = self.model_cpu.config.id2label
+
+    def _predict_with_model(self, image, model, device) -> dict:
+        """使用指定模型和设备执行预测，包含单独计时"""
+        try:
+            # 记录开始时间
+            start_time = time.perf_counter()  # 使用更精确的计时函数
+            
+            # 处理图片并移动到目标设备
+            inputs = self.processor(images=image, return_tensors="pt").to(device)
+            
+            with torch.no_grad():
+                outputs = model(** inputs)
+                
+            logits = outputs.logits
+            probs = torch.nn.functional.softmax(logits, dim=1)
+            max_prob, max_idx = probs.max(dim=1)
+            class_idx = max_idx.item()
+            
+            # 计算处理时间（秒），保留6位小数
+            processing_time = round(time.perf_counter() - start_time, 6)
+            
+            return {
+                "class_id": class_idx,
+                "class_name": self.id2label[class_idx],
+                "confidence": float(max_prob.item()),
+                "device_used": str(device),
+                "processing_time": processing_time  # 处理时间
+            }
+        except Exception as e:
+            return {
+                "class_id": -1,
+                "class_name": "error",
+                "confidence": 0.0,
+                "device_used": str(device),
+                "processing_time": 0.0,
+                "error": str(e)
+            }
+
+    def predict_single_image(self, image) -> dict:
+        """预测单张图片，分别使用GPU和CPU模型"""
+        results = {"status": "success"}
+        
+        # GPU预测（如果可用）
+        if self.model_cuda is not None:
+            cuda_result = self._predict_with_model(image, self.model_cuda, device_cuda)
+        else:
+            cuda_result = {
+                "class_id": -1,
+                "class_name": "error",
+                "confidence": 0.0,
+                "device_used": str(device_cuda),
+                "processing_time": 0.0,
+                "error": "CUDA设备不可用，未加载CUDA模型"
+            }
+        results["cuda_prediction"] = cuda_result
+        
+        # CPU预测（已限制为4核心）
+        cpu_result = self._predict_with_model(image, self.model_cpu, device_cpu)
+        results["cpu_prediction"] = cpu_result
+        
+        return results
+
+# 初始化服务
+app = Flask(__name__)
+MODEL_PATH = os.environ.get("MODEL_PATH", "/model")  # 模型路径（环境变量或默认路径）
+classifier = ImageClassifier(MODEL_PATH)
+
+@app.route('/v1/private/s782b4996', methods=['POST'])
+def predict_single():
+    """接收单张图片并返回预测结果及处理时间"""
+    if 'image' not in request.files:
+        return jsonify({
+            "status": "error",
+            "cuda_prediction": {
+                "class_id": -1,
+                "class_name": "error",
+                "confidence": 0.0,
+                "device_used": str(device_cuda),
+                "processing_time": 0.0,
+                "error": "请求中未包含图片"
+            },
+            "cpu_prediction": {
+                "class_id": -1,
+                "class_name": "error",
+                "confidence": 0.0,
+                "device_used": str(device_cpu),
+                "processing_time": 0.0,
+                "error": "请求中未包含图片"
+            }
+        }), 400
+    
+    image_file = request.files['image']
+    try:
+        image = Image.open(BytesIO(image_file.read())).convert("RGB")
+        result = classifier.predict_single_image(image)
+        return jsonify(result)
+    except Exception as e:
+        return jsonify({
+            "status": "error",
+            "cuda_prediction": {
+                "class_id": -1,
+                "class_name": "error",
+                "confidence": 0.0,
+                "device_used": str(device_cuda),
+                "processing_time": 0.0,
+                "error": str(e)
+            },
+            "cpu_prediction": {
+                "class_id": -1,
+                "class_name": "error",
+                "confidence": 0.0,
+                "device_used": str(device_cpu),
+                "processing_time": 0.0,
+                "error": str(e)
+            }
+        }), 500
+
+@app.route('/health', methods=['GET'])
+def health_check():
+    return jsonify({
+        "status": "healthy", 
+        "cuda_available": device_cuda.type == "cuda",
+        "cuda_device": str(device_cuda),
+        "cpu_device": str(device_cpu),
+        "cpu_threads": torch.get_num_threads()  # 显示CPU线程数
+    }), 200
+
+if __name__ == "__main__":
+    app.run(host='0.0.0.0', port=80, debug=False)
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,13 @@
+requests
+ruamel.yaml
+regex
+pyyaml
+websocket-client==0.44.0
+pydantic==2.6.4
+pydantic_core==2.16.3
+Levenshtein
+numpy
+websockets
+fabric
+vmplatform==0.0.4
+flask
--- a/run_callback.py
+++ b/run_callback.py
@@ -0,0 +1,719 @@
+import json
+import os
+import sys
+import time
+import tempfile
+import zipfile
+import threading
+from collections import defaultdict
+from typing import Dict, List
+
+import yaml
+from pydantic import ValidationError
+
+from schemas.dataset import QueryData
+from utils.client_callback import ClientCallback, EvaluateResult, StopException
+from utils.logger import log
+from utils.service import register_sut
+from utils.update_submit import change_product_available
+from utils.file import dump_json, load_yaml, unzip_dir, load_json, write_file, dump_yaml
+from utils.leaderboard import change_product_unavailable
+
+
+lck = threading.Lock()
+
+# Environment variables by leaderboard
+DATASET_FILEPATH = os.environ["DATASET_FILEPATH"]
+RESULT_FILEPATH = os.environ["RESULT_FILEPATH"]
+
+DETAILED_CASES_FILEPATH = os.environ["DETAILED_CASES_FILEPATH"]
+SUBMIT_CONFIG_FILEPATH = os.environ["SUBMIT_CONFIG_FILEPATH"]
+BENCHMARK_NAME = os.environ["BENCHMARK_NAME"]
+TEST_CONCURRENCY = int(os.getenv('TEST_CONCURRENCY', 1))
+THRESHOLD_OMCER = float(os.getenv('THRESHOLD_OMCER', 0.8))
+
+log.info(f"DATASET_FILEPATH: {DATASET_FILEPATH}")
+workspace_path = "/tmp/workspace"
+
+
+# Environment variables by kubernetes
+MY_POD_IP = os.environ["MY_POD_IP"]
+
+# constants
+RESOURCE_NAME = BENCHMARK_NAME
+
+# Environment variables by judge_flow_config
+LANG = os.getenv("lang")
+SUT_CPU = os.getenv("SUT_CPU", "2")
+SUT_MEMORY = os.getenv("SUT_MEMORY", "4Gi")
+SUT_VGPU = os.getenv("SUT_VGPU", "1")
+#SUT_VGPU_MEM = os.getenv("SUT_VGPU_MEM", str(1843 * int(SUT_VGPU)))
+#SUT_VGPU_CORES = os.getenv("SUT_VGPU_CORES", str(8 * int(SUT_VGPU)))
+SUT_VGPU_ACCELERATOR = os.getenv("SUT_VGPU_ACCELERATOR", "iluvatar-BI-V100")
+RESOURCE_TYPE = os.getenv("RESOURCE_TYPE", "vgpu")
+assert RESOURCE_TYPE in [
+    "cpu",
+    "vgpu",
+], "benchmark judge_flow_config error: RESOURCE_TYPE should be cpu or vgpu"
+
+
+unzip_dir(DATASET_FILEPATH, workspace_path)
+
+def get_sut_url_kubernetes():
+    with open(SUBMIT_CONFIG_FILEPATH, "r") as f:
+        submit_config = yaml.safe_load(f)
+        assert isinstance(submit_config, dict)
+
+    submit_config.setdefault("values", {})
+
+    submit_config["values"]["containers"] = [
+        {
+            "name": "corex-container",
+            "image": "harbor.4pd.io/lab-platform/inf/python:3.9",  #镜像
+            "command": ["sleep"],  # 替换为你的模型启动命令，使用python解释器
+            "args": ["3600"],  # 替换为你的模型参数，运行我的推理脚本
+
+            # 添加存储卷挂载
+            #"volumeMounts": [
+            #    {
+            #        "name": "model-volume",
+            #        "mountPath": "/model"  # 挂载到/model目录
+            #    }
+            #]
+        }
+    ]
+
+    """
+    # 添加存储卷配置
+    submit_config["values"]["volumes"] = [
+        {
+            "name": "model-volume",
+            "persistentVolumeClaim": {
+                "claimName": "sid-model-pvc"  # 使用已有的PVC
+            }
+        }
+    ]
+    """
+
+    """
+    # Inject specified cpu and memory
+    resource = {
+        "cpu": SUT_CPU,
+        "memory": SUT_MEMORY,
+    }
+    """
+    submit_config["values"]["resources"] = {
+        "requests":{},
+        "limits": {},
+    }
+    
+    limits = submit_config["values"]["resources"]["limits"]
+    requests = submit_config["values"]["resources"]["requests"]
+    
+
+
+    # 替换nvidia资源键为iluvatar.ai/gpu
+    vgpu_resource = {
+        "iluvatar.ai/gpu": SUT_VGPU,  # 对应你的GPU资源键
+            # 若需要其他资源（如显存），按你的K8s配置补充，例如：
+            # "iluvatar.ai/gpumem": SUT_VGPU_MEM,
+    }
+    limits.update(vgpu_resource)
+    requests.update(vgpu_resource)
+        # 节点选择器：替换为你的accelerator标签
+    submit_config["values"]["nodeSelector"] = {
+        "contest.4pd.io/accelerator": "iluvatar-BI-V100"  # 你的节点标签
+    }
+        # 容忍度：替换为你的tolerations配置
+    
+    
+
+
+    log.info(f"submit_config: {submit_config}")
+    log.info(f"RESOURCE_NAME: {RESOURCE_NAME}")
+
+    return register_sut(submit_config, RESOURCE_NAME).replace(
+        "ws://", "http://"
+    )
+
+
+def get_sut_url():
+    return get_sut_url_kubernetes()
+
+#SUT_URL = get_sut_url()
+#os.environ["SUT_URL"] = SUT_URL
+
+
+
+#############################################################################
+
+import requests
+import base64
+
+def gen_req_body(apiname, APPId, file_path=None, featureId=None, featureInfo=None, dstFeatureId=None):
+    """
+    生成请求的body
+    :param apiname
+    :param APPId: Appid
+    :param file_name:  文件路径
+    :return:
+    """
+    if apiname == 'createFeature':
+
+        with open(file_path, "rb") as f:
+            audioBytes = f.read()
+        body = {
+            "header": {
+                "app_id": APPId,
+                "status": 3
+            },
+            "parameter": {
+                "s782b4996": {
+                    "func": "createFeature",
+                    "groupId": "test_voiceprint_e",
+                    "featureId": featureId,
+                    "featureInfo":  featureInfo,
+                    "createFeatureRes": {
+                        "encoding": "utf8",
+                        "compress": "raw",
+                        "format": "json"
+                    }
+                }
+            },
+            "payload": {
+                "resource": {
+                    "encoding": "lame",
+                    "sample_rate": 16000,
+                    "channels": 1,
+                    "bit_depth": 16,
+                    "status": 3,
+                    "audio": str(base64.b64encode(audioBytes), 'UTF-8')
+                }
+            }
+        }
+    elif apiname == 'createGroup':
+
+        body = {
+            "header": {
+                "app_id": APPId,
+                "status": 3
+            },
+            "parameter": {
+                "s782b4996": {
+                    "func": "createGroup",
+                    "groupId": "test_voiceprint_e",
+                    "groupName": "vip_user",
+                    "groupInfo": "store_vip_user_voiceprint",
+                    "createGroupRes": {
+                        "encoding": "utf8",
+                        "compress": "raw",
+                        "format": "json"
+                    }
+                }
+            }
+        }
+    elif apiname == 'deleteFeature':
+
+        body = {
+            "header": {
+                "app_id": APPId,
+                "status": 3
+
+            },
+            "parameter": {
+                "s782b4996": {
+                    "func": "deleteFeature",
+                    "groupId": "iFLYTEK_examples_groupId",
+                    "featureId": "iFLYTEK_examples_featureId",
+                    "deleteFeatureRes": {
+                        "encoding": "utf8",
+                        "compress": "raw",
+                        "format": "json"
+                    }
+                }
+            }
+        }
+    elif apiname == 'queryFeatureList':
+
+        body = {
+            "header": {
+                "app_id": APPId,
+                "status": 3
+            },
+            "parameter": {
+                "s782b4996": {
+                    "func": "queryFeatureList",
+                    "groupId": "user_voiceprint_2",
+                    "queryFeatureListRes": {
+                        "encoding": "utf8",
+                        "compress": "raw",
+                        "format": "json"
+                    }
+                }
+            }
+        }
+    elif apiname == 'searchFea':
+
+        with open(file_path, "rb") as f:
+            audioBytes = f.read()
+        body = {
+            "header": {
+                "app_id": APPId,
+                "status": 3
+            },
+            "parameter": {
+                "s782b4996": {
+                    "func": "searchFea",
+                    "groupId": "test_voiceprint_e",
+                    "topK": 1,
+                    "searchFeaRes": {
+                        "encoding": "utf8",
+                        "compress": "raw",
+                        "format": "json"
+                    }
+                }
+            },
+            "payload": {
+                "resource": {
+                    "encoding": "lame",
+                    "sample_rate": 16000,
+                    "channels": 1,
+                    "bit_depth": 16,
+                    "status": 3,
+                    "audio": str(base64.b64encode(audioBytes), 'UTF-8')
+                }
+            }
+        }
+    elif apiname == 'searchScoreFea':
+
+        with open(file_path, "rb") as f:
+            audioBytes = f.read()
+        body = {
+            "header": {
+                "app_id": APPId,
+                "status": 3
+            },
+            "parameter": {
+                "s782b4996": {
+                    "func": "searchScoreFea",
+                    "groupId": "test_voiceprint_e",
+                    "dstFeatureId": dstFeatureId,
+                    "searchScoreFeaRes": {
+                        "encoding": "utf8",
+                        "compress": "raw",
+                        "format": "json"
+                    }
+                }
+            },
+            "payload": {
+                "resource": {
+                    "encoding": "lame",
+                    "sample_rate": 16000,
+                    "channels": 1,
+                    "bit_depth": 16,
+                    "status": 3,
+                    "audio": str(base64.b64encode(audioBytes), 'UTF-8')
+                }
+            }
+        }
+    elif apiname == 'updateFeature':
+
+        with open(file_path, "rb") as f:
+            audioBytes = f.read()
+        body = {
+            "header": {
+                "app_id": APPId,
+                "status": 3
+            },
+            "parameter": {
+                "s782b4996": {
+                    "func": "updateFeature",
+                    "groupId": "iFLYTEK_examples_groupId",
+                    "featureId": "iFLYTEK_examples_featureId",
+                    "featureInfo": "iFLYTEK_examples_featureInfo_update",
+                    "updateFeatureRes": {
+                        "encoding": "utf8",
+                        "compress": "raw",
+                        "format": "json"
+                    }
+                }
+            },
+            "payload": {
+                "resource": {
+                    "encoding": "lame",
+                    "sample_rate": 16000,
+                    "channels": 1,
+                    "bit_depth": 16,
+                    "status": 3,
+                    "audio": str(base64.b64encode(audioBytes), 'UTF-8')
+                }
+            }
+        }
+    elif apiname == 'deleteGroup':
+        body = {
+            "header": {
+                "app_id": APPId,
+                "status": 3
+            },
+            "parameter": {
+                "s782b4996": {
+                    "func": "deleteGroup",
+                    "groupId": "iFLYTEK_examples_groupId",
+                    "deleteGroupRes": {
+                        "encoding": "utf8",
+                        "compress": "raw",
+                        "format": "json"
+                    }
+                }
+            }
+        }
+    else:
+        raise Exception(
+            "输入的apiname不在[createFeature, createGroup, deleteFeature, queryFeatureList, searchFea, searchScoreFea,updateFeature]内，请检查")
+    return body
+
+ 
+
+log.info(f"开始请求获取到SUT服务URL")
+# 获取SUT服务URL
+sut_url = get_sut_url()
+print(f"获取到的SUT_URL: {sut_url}")  # 调试输出
+log.info(f"获取到SUT服务URL: {sut_url}")
+
+from urllib.parse import urlparse
+
+# 全局变量
+text_decoded = None
+
+###################################新增新增################################
+def req_url(api_name, APPId, file_path=None, featureId=None, featureInfo=None, dstFeatureId=None):
+    """
+    开始请求
+    :param APPId: APPID
+    :param file_path: body里的文件路径
+    :return:
+    """
+
+    global text_decoded
+    
+    body = gen_req_body(apiname=api_name, APPId=APPId, file_path=file_path, featureId=featureId, featureInfo=featureInfo, dstFeatureId=dstFeatureId)
+    #request_url = 'https://ai-cloud.4paradigm.com:9443/sid/v1/private/s782b4996'
+
+    #request_url = 'https://sut:80/sid/v1/private/s782b4996'
+
+    #headers = {'content-type': "application/json", 'host': 'ai-cloud.4paradigm.com', 'appid': APPId}
+    
+    parsed_url = urlparse(sut_url)
+    headers = {'content-type': "application/json", 'host': parsed_url.hostname, 'appid': APPId}
+
+    # 1. 首先测试服务健康检查
+    response = requests.get(f"{sut_url}/health")
+    print(response.status_code, response.text)
+
+
+    # 请求头
+    headers = {"Content-Type": "application/json"}
+    # 请求体（可指定限制处理的图片数量）
+    body = {"limit": 20 } # 可选参数，限制处理的图片总数
+
+    # 发送POST请求
+    response = requests.post(
+        f"{sut_url}/v1/private/s782b4996",
+        data=json.dumps(body),
+        headers=headers
+    )   
+
+    # 解析响应结果
+    if response.status_code == 200:
+        result = response.json()
+        print("预测评估结果:")
+        print(f"准确率: {result['metrics']['accuracy']}%")
+        print(f"平均召回率: {result['metrics']['average_recall']}%")
+        print(f"处理图片总数: {result['metrics']['total_images']}")
+    else:
+        print(f"请求失败，状态码: {response.status_code}")
+        print(f"错误信息: {response.text}")            
+
+
+
+
+    # 添加基本认证信息
+    auth = ('llm', 'Rmf4#LcG(iFZrjU;2J')
+    #response = requests.post(request_url, data=json.dumps(body), headers=headers, auth=auth)
+
+    #response = requests.post(sut_url + "/predict", data=json.dumps(body), headers=headers, auth=auth)
+    #response = requests.post(f"{sut_url}/sid/v1/private/s782b4996", data=json.dumps(body), headers=headers, auth=auth)
+    """
+    response = requests.post(f"{sut_url}/v1/private/s782b4996", data=json.dumps(body), headers=headers)
+    """
+
+
+
+
+    #print("HTTP状态码:", response.status_code)
+    #print("原始响应内容:", response.text)  # 先打印原始内容
+    #print(f"请求URL: {sut_url + '/v1/private/s782b4996'}")
+    #print(f"请求headers: {headers}")
+    #print(f"请求body: {body}")
+
+    
+
+    #tempResult = json.loads(response.content.decode('utf-8'))
+    #print(tempResult)
+
+    """
+    # 对text字段进行Base64解码
+    if 'payload' in tempResult and 'updateFeatureRes' in tempResult['payload']:
+        text_encoded = tempResult['payload']['updateFeatureRes']['text']
+        text_decoded = base64.b64decode(text_encoded).decode('utf-8')
+        print(f"Base64解码后的text字段内容: {text_decoded}")
+    """
+
+    #text_encoded = tempResult['payload']['updateFeatureRes']['text']
+    #text_decoded = base64.b64decode(text_encoded).decode('utf-8')
+    #print(f"Base64解码后的text字段内容: {text_decoded}")        
+
+
+    # 获取响应的 JSON 数据
+    result = response.json()
+    with open(RESULT_FILEPATH, "w") as f:
+        json.dump(result, f, indent=4, ensure_ascii=False)
+    print(f"结果已成功写入 {RESULT_FILEPATH}")
+
+submit_config_filepath = os.getenv("SUBMIT_CONFIG_FILEPATH", "./tests/resources/submit_config")
+result_filepath = os.getenv("RESULT_FILEPATH", "./out/result")
+bad_cases_filepath = os.getenv("BAD_CASES_FILEPATH", "./out/badcase")
+#detail_cases_filepath = os.getenv("DETAILED_CASES_FILEPATH", "./out/detailcase.jsonl")
+
+from typing import Any, Dict, List
+
+def result2file(
+    result: Dict[str, Any],
+    detail_cases: List[Dict[str, Any]] = None
+):
+    assert result_filepath is not None
+    assert bad_cases_filepath is not None
+    #assert detailed_cases_filepath is not None
+
+    if result is not None:
+        with open(result_filepath, "w") as f:
+            json.dump(result, f, indent=4, ensure_ascii=False)
+        #if LOCAL_TEST:
+        #    logger.info(f'result:\n {json.dumps(result, indent=4)}')
+    """
+    if detail_cases is not None:
+        with open(detailed_cases_filepath, "w") as f:
+            json.dump(detail_cases, f, indent=4, ensure_ascii=False)
+        if LOCAL_TEST:
+            logger.info(f'result:\n {json.dumps(detail_cases, indent=4)}')
+    """
+
+
+def test_image_prediction(sut_url, image_path):
+    """发送单张图片到服务端预测"""
+    url = f"{sut_url}/v1/private/s782b4996"
+    
+    try:
+        with open(image_path, 'rb') as f:
+            files = {'image': f}
+            response = requests.post(url, files=files, timeout=30)
+        
+        result = response.json()
+        if result.get('status') != 'success':
+            return None, f"服务端错误: {result.get('message')}"
+        
+        return result, None
+    except Exception as e:
+        return None, f"请求错误: {str(e)}"
+
+
+
+import random
+import time
+#from tqdm import tqdm
+import os
+import requests
+
+if __name__ == '__main__':
+    
+    print(f"\n===== main开始请求接口 ===============================================")
+    # 1. 首先测试服务健康检查
+
+    print(f"\n===== 服务健康检查 ===================================================")
+    response = requests.get(f"{sut_url}/health")
+    print(response.status_code, response.text)
+
+
+
+    ###############################################################################################
+    dataset_root = "/tmp/workspace/256ObjectCategoriesNew"  # 数据集根目录
+    samples_per_class = 3  # 每个类别抽取的样本数
+    image_extensions = ('.jpg', '.jpeg', '.png', '.bmp', '.gif')  # 支持的图片格式
+
+    # 结果统计变量
+    total_samples = 0
+    #correct_predictions = 0
+
+    # GPU统计
+    gpu_true_positives = 0
+    gpu_false_positives = 0
+    gpu_false_negatives = 0
+    gpu_total_processing_time = 0.0
+    
+    # CPU统计
+    cpu_true_positives = 0
+    cpu_false_positives = 0
+    cpu_false_negatives = 0
+    cpu_total_processing_time = 0.0
+
+
+
+    # 遍历所有类别文件夹
+    for folder_name in os.listdir(dataset_root):
+        folder_path = os.path.join(dataset_root, folder_name)
+        
+        # 跳过非文件夹的项目
+        if not os.path.isdir(folder_path):
+            continue
+        
+        # 提取类别名（从"序号.name"格式中提取name部分）
+        try:
+            class_name = folder_name.split('.', 1)[1].strip().lower()
+        except IndexError:
+            print(f"警告：文件夹 {folder_name} 命名格式不正确，跳过该文件夹")
+            continue
+        
+        # 获取文件夹中所有图片
+        image_files = []
+        for file in os.listdir(folder_path):
+            file_path = os.path.join(folder_path, file)
+            if os.path.isfile(file_path) and file.lower().endswith(image_extensions):
+                image_files.append(file_path)
+        
+        # 随机抽取指定数量的图片（如果不足则取全部）
+        selected_images = random.sample(
+            image_files, 
+            min(samples_per_class, len(image_files))
+        )
+
+        for img_path in selected_images:
+            total_samples += 1
+
+            # 获取预测结果
+            prediction, error = test_image_prediction(sut_url, img_path)
+
+            # 打印test_image_prediction返回的结果
+            print(f"test_image_prediction返回的prediction: {prediction}")
+            print(f"test_image_prediction返回的error: {error}")
+
+            if error:
+                print(f"处理图片 {img_path} 失败: {error}")
+                continue
+
+            
+            
+            # 解析GPU预测结果
+            gpu_pred = prediction.get('cuda_prediction', {})
+            gpu_pred_class = gpu_pred.get('class_name', '').lower()
+            gpu_processing_time = gpu_pred.get('processing_time', 0.0)
+            
+            # 解析CPU预测结果
+            cpu_pred = prediction.get('cpu_prediction', {})
+            cpu_pred_class = cpu_pred.get('class_name', '').lower()
+            cpu_processing_time = cpu_pred.get('processing_time', 0.0)
+            
+            # 判断GPU预测是否正确
+            gpu_is_correct = class_name in gpu_pred_class
+            if gpu_is_correct:
+                gpu_true_positives += 1
+            else:
+                gpu_false_positives += 1
+                gpu_false_negatives += 1
+            
+            # 判断CPU预测是否正确
+            cpu_is_correct = class_name in cpu_pred_class
+            if cpu_is_correct:
+                cpu_true_positives += 1
+            else:
+                cpu_false_positives += 1
+                cpu_false_negatives += 1
+            
+            # 累加处理时间
+            gpu_total_processing_time += gpu_processing_time
+            cpu_total_processing_time += cpu_processing_time
+            
+            # 打印详细结果
+            print(f"图片: {os.path.basename(img_path)} | 真实: {class_name}")
+            print(f"GPU预测: {gpu_pred_class} | {'正确' if gpu_is_correct else '错误'} | 耗时: {gpu_processing_time:.6f}s")
+            print(f"CPU预测: {cpu_pred_class} | {'正确' if cpu_is_correct else '错误'} | 耗时: {cpu_processing_time:.6f}s")
+            print("-" * 50)
+
+    
+    # 初始化结果字典
+    result = {
+        # GPU指标
+        "gpu_accuracy": 0.0,
+        "gpu_recall": 0.0,
+        "gpu_running_time": round(gpu_total_processing_time, 6),
+        "gpu_throughput": 0.0,
+        
+        # CPU指标
+        "cpu_accuracy": 0.0,
+        "cpu_recall": 0.0,
+        "cpu_running_time": round(cpu_total_processing_time, 6),
+        "cpu_throughput": 0.0
+    }
+
+    # 计算GPU指标
+    gpu_accuracy = gpu_true_positives / total_samples * 100
+    gpu_recall_denominator = gpu_true_positives + gpu_false_negatives
+    gpu_recall = gpu_true_positives / gpu_recall_denominator * 100 if gpu_recall_denominator > 0 else 0
+    gpu_throughput = total_samples / gpu_total_processing_time if gpu_total_processing_time > 1e-6 else 0
+        
+    # 计算CPU指标
+    cpu_accuracy = cpu_true_positives / total_samples * 100
+    cpu_recall_denominator = cpu_true_positives + cpu_false_negatives
+    cpu_recall = cpu_true_positives / cpu_recall_denominator * 100 if cpu_recall_denominator > 0 else 0
+    cpu_throughput = total_samples / cpu_total_processing_time if cpu_total_processing_time > 1e-6 else 0
+        
+    # 更新结果字典
+    result.update({
+        "gpu_accuracy": round(gpu_accuracy, 6),
+        "gpu_recall": round(gpu_recall, 6),
+        "gpu_throughput": round(gpu_throughput, 6),
+            
+        "cpu_accuracy": round(cpu_accuracy, 6),
+        "cpu_recall": round(cpu_recall, 6),
+        "cpu_throughput": round(cpu_throughput, 6)
+    })
+    
+
+    # 打印最终统计结果
+    print("\n" + "="*50)
+    print(f"总样本数: {total_samples}")
+    print("\nGPU指标:")
+    print(f"准确率: {result['gpu_accuracy']:.4f}%")
+    print(f"召回率: {result['gpu_recall']:.4f}%")
+    print(f"总运行时间: {result['gpu_running_time']:.6f}s")
+    print(f"吞吐量: {result['gpu_throughput']:.2f}张/秒")
+    
+    print("\nCPU指标:")
+    print(f"准确率: {result['cpu_accuracy']:.4f}%")
+    print(f"召回率: {result['cpu_recall']:.4f}%")
+    print(f"总运行时间: {result['cpu_running_time']:.6f}s")
+    print(f"吞吐量: {result['cpu_throughput']:.2f}张/秒")
+    print("="*50)
+
+
+    #result = {}  
+    #result['accuracy_1_1'] = 3
+    result2file(result)
+
+    if abs(gpu_accuracy - cpu_accuracy) > 3:
+        log.error(f"gpu与cpu准确率差别超过3%，模型结果不正确")
+        change_product_unavailable()
+
+
+    exit_code = 0
+    
+
--- a/utils/init.py
+++ b/utils/init.py
--- a/utils/asr_ter.py
+++ b/utils/asr_ter.py
@@ -0,0 +1,57 @@
+# copy from
+# https://gitlab.4pd.io/scene_lab/leaderboard/judge_flows/foundamental_capability/blob/master/utils/asr_ter.py
+
+
+def calc_ter_speechio(pred, ref, language="zh"):
+    assert language == "zh", "Unsupported language %s" % language
+    assert ref is not None and ref != "", "Reference script cannot be empty"
+    if language == "zh":
+        from .speechio import error_rate_zh as error_rate
+        from .speechio import textnorm_zh as textnorm
+
+        normalizer = textnorm.TextNorm(
+            to_banjiao=True,
+            to_upper=True,
+            to_lower=False,
+            remove_fillers=True,
+            remove_erhua=True,
+            check_chars=False,
+            remove_space=False,
+            cc_mode="",
+        )
+        norm_pred = normalizer(pred if pred is not None else "")
+        norm_ref = normalizer(ref)
+        tokenizer = "char"
+        alignment, score = error_rate.EditDistance(
+            error_rate.tokenize_text(norm_ref, tokenizer),
+            error_rate.tokenize_text(norm_pred, tokenizer),
+        )
+        c, s, i, d = error_rate.CountEdits(alignment)
+        ter = error_rate.ComputeTokenErrorRate(c, s, i, d) / 100.0
+        return {"ter": ter, "err_token_cnt": s + d + i, "ref_all_token_cnt": s + d + c}
+    assert False, "Bug, not reachable"
+
+
+def calc_ter_wjs(pred, ref, language="zh"):
+    assert language == "zh", "Unsupported language %s" % language
+    assert ref is not None and ref != "", "Reference script cannot be empty"
+    from . import wjs_asr_wer
+
+    ignore_words = set()
+    case_sensitive = False
+    split = None
+    calculator = wjs_asr_wer.Calculator()
+    norm_pred = wjs_asr_wer.normalize(
+        wjs_asr_wer.characterize(pred if pred is not None else ""),
+        ignore_words,
+        case_sensitive,
+        split,
+    )
+    norm_ref = wjs_asr_wer.normalize(wjs_asr_wer.characterize(ref), ignore_words, case_sensitive, split)
+    result = calculator.calculate(norm_pred, norm_ref)
+    ter = ((result["ins"] + result["sub"] + result["del"]) * 1.0 / result["all"]) if result["all"] != 0 else 1.0
+    return {
+        "ter": ter,
+        "err_token_cnt": result["ins"] + result["sub"] + result["del"],
+        "ref_all_token_cnt": result["all"],
+    }
--- a/utils/client.py
+++ b/utils/client.py
@@ -0,0 +1,224 @@
+import json
+import os
+import threading
+import time
+import traceback
+from copy import deepcopy
+from typing import Any, List
+
+import websocket
+from pydantic_core import ValidationError
+from websocket import create_connection
+
+from schemas.context import ASRContext
+from schemas.stream import StreamDataModel, StreamResultModel
+from utils.logger import logger
+
+IN_TEST = os.getenv("SUBMIT_CONFIG_FILEPATH", None) is None
+
+
+class Client:
+    def __init__(self, sut_url: str, context: ASRContext) -> None:
+        # base_url = "ws://127.0.0.1:5003"
+        self.base_url = sut_url + "/recognition"
+        logger.info(f"{self.base_url}")
+        self.context: ASRContext = deepcopy(context)
+        # if not os.getenv("DATASET_FILEPATH", ""):
+        # self.base_url = "wss://speech.4paradigm.com/aibuds/api/v1/recognition"
+        # self.base_url = "ws://localhost:5003/recognition"
+        self.connect_num = 0
+        self.exception = False
+        self.close_time = 10**50
+        self.send_time: List[float] = []
+        self.recv_time: List[float] = []
+        self.predict_data: List[Any] = []
+        self.success = True
+
+    def action(self):
+        # 如果 5 次初始化都失败，则退出
+        connect_success = False
+        for i in range(5):
+            try:
+                self._connect_init()
+                connect_success = True
+                break
+            except Exception as e:
+                logger.error(f"第 {i+1} 次连接失败，原因：{e}")
+                time.sleep(int(os.getenv("connect_sleep", 10)))
+        if not connect_success:
+            exit(-1)
+        self.trecv = threading.Thread(target=self._recv)
+        self.trecv.start()
+        self._send()
+        self._close()
+        return self._gen_result()
+
+    def _connect_init(self):
+        end_time = time.time() + float(os.getenv("end_time", 2))
+        success = False
+        try:
+            self.ws = create_connection(self.base_url)
+            self.ws.send(json.dumps(self._gen_init_data()))
+            while time.time() < end_time and not success:
+                data = self.ws.recv()
+                logger.info(f"data {data}")
+                if len(data) == 0:
+                    time.sleep(1)
+                    continue
+                if isinstance(data, str):
+                    try:
+                        data = json.loads(data)
+                    except Exception:
+                        raise Exception("初始化阶段，数据不是 json 字符串格式，终止流程")
+                if isinstance(data, dict):
+                    success = data.get("success", False)
+                    if not success:
+                        logger.error(f"初始化失败，返回的结果为 {data}，终止流程")
+                    else:
+                        break
+                logger.error("初始化阶段，数据不是 json 字符串格式，终止流程")
+                exit(-1)
+        except websocket.WebSocketConnectionClosedException or TimeoutError:
+            raise Exception("初始化阶段连接中断，终止流程")
+            # exit(-1)
+        except ConnectionRefusedError:
+            raise Exception("初始化阶段，连接失败，等待 10s 后重试，最多重试 5 次")
+            # logger.error("初始化阶段，连接失败，等待 10s 后重试，最多重试 5 次")
+            # self.connect_num += 1
+            # if self.connect_num <= 4:
+            #     time.sleep(int(os.getenv("connect_sleep", 10)))
+            #     self._connect_init()
+            #     success = True
+            # else:
+            #     logger.error("初始化阶段连接失败多次")
+            #     exit(-1)
+        if not success:
+            # logger.error("初始化阶段 60s 没有返回数据，时间太长，终止流程")
+            raise Exception("初始化阶段 60s 没有返回数据，时间太长，终止流程")
+        else:
+            logger.info("建立连接成功")
+        self.connect_num = 0
+
+    def _send(self):
+        send_ts = float(os.getenv("send_interval", 60))
+        if not self.success:
+            return
+
+        with open(self.context.file_path, "rb") as fp:
+            wav_data = fp.read()
+            meta_length = wav_data.index(b"data") + 8
+
+        try:
+            with open(self.context.file_path, "rb") as fp:
+                # 去掉 wav 文件的头信息
+                fp.read(meta_length)
+                # 上一段音频的发送时间
+                last_send_time = -1
+                # 正文内容
+                while True:
+                    now_time = time.perf_counter()
+                    if last_send_time == -1:
+                        chunk = fp.read(int(self.context.chunk_size))
+                    else:
+                        interval_cnt = max(
+                            int((now_time - last_send_time) / self.context.wait_time),
+                            1,
+                        )
+                        chunk = fp.read(int(self.context.chunk_size * interval_cnt))
+                    if not chunk:
+                        break
+                    send_time_start = time.perf_counter()
+                    self.ws.send(chunk, websocket.ABNF.OPCODE_BINARY)
+                    self.send_time.append(send_time_start)
+                    last_send_time = send_time_start
+                    send_time_end = time.perf_counter()
+                    if send_time_end - send_time_start > send_ts:
+                        logger.error(f"发送延迟已经超过 {send_ts}s, 终止当前音频发送")
+                        break
+                    if (sleep_time := self.context.wait_time + now_time - send_time_end) > 0:
+                        time.sleep(sleep_time)
+            logger.info("当条语音数据发送完成")
+            self.ws.send(json.dumps({"end": True}))
+            logger.info("2s 后关闭双向连接.")
+        except BrokenPipeError:
+            logger.error("发送数据出错，被测服务出现故障")
+        except Exception as e:
+            logger.error(f"Exception: {e}")
+            logger.error(f"{traceback.print_exc()}")
+            logger.error("发送数据失败")
+            self.success = False
+        # self.close_time = time.perf_counter() + int(os.getenv("api_timeout", 2))
+        self.close_time = time.perf_counter() + 20 * 60
+
+    def _recv(self):
+        try:
+            while self.ws.connected and self.success:
+                recv_data = self.ws.recv()
+                if isinstance(recv_data, str):
+                    if recv_data := str(recv_data):
+                        self.recv_time.append(time.perf_counter())
+                        # 识别到最后的合并结果后再关闭
+                        recognition_results = StreamResultModel(**json.loads(recv_data)).recognition_results
+                        if (
+                            recognition_results.final_result
+                            and recognition_results.start_time == 0
+                            and recognition_results.end_time == 0
+                            and recognition_results.para_seq == 0
+                        ):
+                            self.success = False
+                        else:
+                            self.predict_data.append(recv_data)
+                        # if recv_data.recognition_results.final_result and (IN_TEST or os.getenv('test')):
+                        #     logger.info(f"recv_data {recv_data}")
+                else:
+                    self.success = False
+                    raise Exception("返回的结果不是字符串形式")
+        except websocket.WebSocketConnectionClosedException:
+            logger.error("WebSocketConnectionClosedException")
+        except ValidationError as e:
+            logger.error("返回的结果不符合格式")
+            logger.error(f"Exception is {e}")
+            os._exit(1)
+        except OSError:
+            pass
+        except Exception:
+            logger.error(f"{traceback.print_exc()}")
+            logger.error("处理被测服务返回数据时出错")
+            self.success = False
+
+    def _close(self):
+        while time.perf_counter() < self.close_time and self.success:
+            # while not self.success:
+            time.sleep(1)
+        try:
+            self.ws.close()
+        except Exception as e:
+            print(e)
+            pass
+
+    def _gen_result(self) -> dict:
+        if not self.predict_data:
+            logger.error("没有任何数据返回")
+        self.predict_data = [StreamResultModel(**json.loads(data)).recognition_results for data in self.predict_data]
+        # for item in self.predict_data:
+        #     if item.final_result and (IN_TEST or os.getenv('test')):
+        #         logger.info(f"recv_data {item}")
+
+        return {
+            "fail": not self.predict_data,
+            "send_time": self.send_time,
+            "recv_time": self.recv_time,
+            "predict_data": self.predict_data,
+        }
+
+    def _gen_init_data(self) -> dict:
+        return {
+            "parameter": {
+                "lang": self.context.lang,
+                "sample_rate": self.context.sample_rate,
+                "channel": self.context.channel,
+                "format": self.context.audio_format,
+                "bits": self.context.bits,
+                "enable_words": self.context.enable_words,
+            }
+        }
--- a/utils/client_async.py
+++ b/utils/client_async.py
@@ -0,0 +1,277 @@
+import asyncio
+import json
+import os
+import time
+import traceback
+from copy import deepcopy
+from enum import Enum
+from typing import Any, List
+
+import websockets
+from pydantic_core import ValidationError
+
+from schemas.context import ASRContext
+from schemas.stream import StreamResultModel, StreamWordsModel
+from utils.logger import logger
+
+IN_TEST = os.getenv("SUBMIT_CONFIG_FILEPATH", None) is None
+
+
+class STATUS_DATA(str, Enum):
+    WAITING_FIRST_INIT = "waiting_first_init"
+    FIRST_FAIL = "fail"
+    WAITING_SECOND_INIT = "waiting_second_init"
+    SECOND_INIT = "second_fail"
+    WAITING_THIRD_INIT = "waiting_third_init"
+    THIRD_INIT = "third_fail"
+    SUCCESS = "success"
+    CLOSED = "closed"
+
+
+class ClientAsync:
+    def __init__(self, sut_url: str, context: ASRContext, idx: int) -> None:
+        # base_url = "ws://127.0.0.1:5003"
+        self.base_url = sut_url + "/recognition"
+        self.context: ASRContext = deepcopy(context)
+        self.idx = idx
+        # if not os.getenv("DATASET_FILEPATH", ""):
+        # self.base_url = "wss://speech.4paradigm.com/aibuds/api/v1/recognition"
+        # self.base_url = "ws://localhost:5003/recognition"
+        self.fail_count = 0
+        self.close_time = 10**50
+        self.send_time: List[float] = []
+        self.recv_time: List[float] = []
+        self.predict_data: List[Any] = []
+
+    async def _sender(
+        self, websocket: websockets.WebSocketClientProtocol, send_queue: asyncio.Queue, recv_queue: asyncio.Queue
+    ):
+        # 设置 websocket 缓冲区大小
+        websocket.transport.set_write_buffer_limits(1024 * 1024 * 1024)
+
+        # 发送初始化数据
+        await websocket.send(json.dumps(self._gen_init_data()))
+        await send_queue.put(STATUS_DATA.WAITING_FIRST_INIT)
+        connect_status = await recv_queue.get()
+        if connect_status == STATUS_DATA.FIRST_FAIL:
+            return
+
+        # 开始发送音频
+        with open(self.context.file_path, "rb") as fp:
+            wav_data = fp.read()
+            meta_length = wav_data.index(b"data") + 8
+        try:
+            with open(self.context.file_path, "rb") as fp:
+                # 去掉 wav 文件的头信息
+                fp.read(meta_length)
+                wav_time = 0.0
+                label_id = 0
+                char_contains_rate_checktime = []
+                char_contains_rate_checktime_id = 0
+                while True:
+                    now_time = time.perf_counter()
+                    chunk = fp.read(int(self.context.chunk_size))
+                    if not chunk:
+                        break
+                    wav_time += self.context.wait_time
+                    try:
+                        self.send_time.append(time.perf_counter())
+                        await asyncio.wait_for(websocket.send(chunk), timeout=0.08)
+                    except asyncio.exceptions.TimeoutError:
+                        pass
+                    while label_id < len(self.context.labels) and wav_time >= self.context.labels[label_id].start:
+                        char_contains_rate_checktime.append(now_time + 3.0)
+                        label_id += 1
+                    predict_text_len = sum(map(lambda x: len(x.text), self.predict_data))
+                    while char_contains_rate_checktime_id < len(char_contains_rate_checktime) and \
+                            char_contains_rate_checktime[char_contains_rate_checktime_id] <= now_time:
+                        label_text_len = sum(
+                            map(lambda x: len(x.answer),
+                                self.context.labels[:char_contains_rate_checktime_id+1]))
+                        if predict_text_len / self.context.char_contains_rate < label_text_len:
+                            self.context.fail_char_contains_rate_num += 1
+                        char_contains_rate_checktime_id += 1
+                    await asyncio.sleep(max(0, self.context.wait_time - (time.perf_counter() - now_time)))
+            await websocket.send(json.dumps({"end": True}))
+            logger.info(f"第 {self.idx} 条数据，当条语音数据发送完成")
+            logger.info(f"第 {self.idx} 条数据，3s 后关闭双向连接.")
+            self.close_time = time.perf_counter() + 3
+        except websockets.exceptions.ConnectionClosedError:
+            logger.error(f"第 {self.idx} 条数据发送过程中，连接断开")
+        except Exception:
+            logger.error(f"{traceback.print_exc()}")
+            logger.error(f"第 {self.idx} 条数据，发送数据失败")
+
+    async def _recv(
+        self, websocket: websockets.WebSocketClientProtocol, send_queue: asyncio.Queue, recv_queue: asyncio.Queue
+    ):
+        await recv_queue.get()
+        try:
+            await asyncio.wait_for(websocket.recv(), timeout=2)
+        except asyncio.exceptions.TimeoutError:
+            await send_queue.put(STATUS_DATA.FIRST_FAIL)
+            logger.info(f"第 {self.idx} 条数据，初始化阶段, 2s 没收到 success 返回，超时了")
+            self.fail_count += 1
+            return
+        except Exception as e:
+            await send_queue.put(STATUS_DATA.FIRST_FAIL)
+            logger.error(f"第 {self.idx} 条数据，初始化阶段, 收到异常：{e}")
+            self.fail_count += 1
+            return
+        else:
+            await send_queue.put(STATUS_DATA.SUCCESS)
+
+        # 开始接收语音识别结果
+        try:
+            while websocket.open:
+                # 接收数据
+                recv_data = await websocket.recv()
+                if isinstance(recv_data, str):
+                    self.recv_time.append(time.perf_counter())
+                    recv_data = str(recv_data)
+                    recv_data = json.loads(recv_data)
+                    result = StreamResultModel(**recv_data)
+                    recognition_results = result.asr_results
+                    if (
+                        recognition_results.final_result
+                        and not recognition_results.language
+                        and recognition_results.start_time == 0
+                        and recognition_results.end_time == 0
+                        and recognition_results.para_seq == 0
+                    ):
+                        pass
+                    else:
+                        self.predict_data.append(recognition_results)
+                else:
+                    raise Exception("返回的结果不是字符串形式")
+        except websockets.exceptions.ConnectionClosedOK:
+            pass
+        except websockets.exceptions.ConnectionClosedError:
+            pass
+        except ValidationError as e:
+            logger.error(f"第 {self.idx} 条数据，返回的结果不符合格式")
+            logger.error(f"Exception is {e}")
+            os._exit(1)
+        except OSError:
+            pass
+        except Exception:
+            logger.error(f"{traceback.print_exc()}")
+            logger.error(f"第 {self.idx} 条数据，处理被测服务返回数据时出错")
+
+    async def _action(self):
+        logger.info(f"第 {self.idx} 条数据开始测试")
+
+        while self.fail_count < 3:
+
+            send_queue = asyncio.Queue()
+            recv_queue = asyncio.Queue()
+
+            self.send_time: List[float] = []
+            self.recv_time: List[float] = []
+            self.predict_data: List[Any] = []
+
+            async with websockets.connect(self.base_url) as websocket:
+                send_task = asyncio.create_task(self._sender(websocket, send_queue, recv_queue))
+                recv_task = asyncio.create_task(self._recv(websocket, recv_queue, send_queue))
+
+                await asyncio.gather(send_task)
+                await asyncio.sleep(3)
+
+            await asyncio.gather(recv_task)
+
+            if self.send_time:
+                break
+            else:
+                self.fail_count += 1
+                logger.info(f"第 {self.idx} 条数据，初始化阶段, 第 {self.fail_count} 次失败, 1s 后重试")
+                time.sleep(1)
+
+    def action(self):
+        asyncio.run(self._action())
+        return self._gen_result()
+
+    def _gen_result(self) -> ASRContext:
+        if not self.predict_data:
+            logger.error(f"第 {self.idx} 条数据，没有任何数据返回")
+        self.context.append_preds(self.predict_data, self.send_time, self.recv_time)
+        self.context.fail = not self.predict_data
+
+        punctuation_words: List[StreamWordsModel] = []
+        for pred in self.predict_data:
+            punctuations = [",", ".", "!", "?"]
+            if pred.language == "zh":
+                punctuations = ["，", "。", "！", "？"]
+            elif pred.language == "ja":
+                punctuations = ["、", "。", "！", "？"]
+            elif pred.language in ("ar", "fa"):
+                punctuations = ["،", ".", "!", "؟"]
+            elif pred.language == "el":
+                punctuations = [",", ".", "！", "；"]
+            elif pred.language == "ti":
+                punctuations = ["།"]
+
+            for word in pred.words:
+                if word.text in punctuations:
+                    punctuation_words.append(word)
+        start_times = list(map(lambda x: x.start_time, punctuation_words))
+        start_times = sorted(start_times)
+        end_times = list(map(lambda x: x.end_time, punctuation_words))
+        end_times = sorted(end_times)
+
+        self.context.punctuation_num = len(self.context.labels)
+        label_n = len(self.context.labels)
+        for i, label in enumerate(self.context.labels):
+            label_left = (label.end - 0.7)
+            label_right = (label.end + 0.7)
+            if i < label_n - 1:
+                label_left = label.end
+                label_right = self.context.labels[i+1].start
+
+            exist = False
+
+            def upper_bound(x: float, lst: List[float]) -> int:
+                ans = -1
+                left, right = 0, len(lst) - 1
+                while left <= right:
+                    mid = (left + right) // 2
+                    if lst[mid] >= x:
+                        ans = mid
+                        right = mid - 1
+                    else:
+                        left = mid + 1
+                return ans
+
+            def lower_bound(x: float, lst: List[float]) -> int:
+                ans = -1
+                left, right = 0, len(lst) - 1
+                while left <= right:
+                    mid = (left + right) // 2
+                    if lst[mid] <= x:
+                        ans = mid
+                        left = mid + 1
+                    else:
+                        right = mid - 1
+                return ans
+
+            left_in_pred = upper_bound(label_left, start_times)
+            if left_in_pred != -1 and start_times[left_in_pred] <= label_right:
+                exist = True
+            right_in_pred = lower_bound(label_right, end_times)
+            if right_in_pred != -1 and end_times[right_in_pred] >= label_left:
+                exist = True
+
+            if exist:
+                self.context.pred_punctuation_num += 1
+        return self.context
+
+    def _gen_init_data(self) -> dict:
+        return {
+            "parameter": {
+                "lang": None,
+                "sample_rate": self.context.sample_rate,
+                "channel": self.context.channel,
+                "format": self.context.audio_format,
+                "bits": self.context.bits,
+                "enable_words": self.context.enable_words,
+            }
+        }
--- a/utils/client_callback.py
+++ b/utils/client_callback.py
@@ -0,0 +1,409 @@
+import logging
+import os
+import threading
+import time
+from typing import Dict, List, Optional
+
+import requests
+from flask import Flask, abort, request
+from pydantic import BaseModel, Field, ValidationError, field_validator
+
+from schemas.dataset import QueryData
+from schemas.stream import StreamDataModel
+from utils.evaluator_plus import evaluate_editops, evaluate_punctuation
+
+from .logger import log
+
+MY_POD_IP = os.environ["MY_POD_IP"]
+
+
+class StopException(Exception): ...
+
+
+class EvaluateResult(BaseModel):
+    lang: str
+    cer: float
+    align_start: Dict[int, int] = Field(
+        description="句首字对齐时间差值(ms) -> 对齐数"
+    )
+    align_end: Dict[int, int] = Field(
+        description="句尾字对齐时间差值(ms) -> 对齐数"
+    )
+    first_word_distance_sum: float = Field(description="句首字距离总和(s)")
+    last_word_distance_sum: float = Field(description="句尾字距离总和(s)")
+    rtf: float = Field(description="翻译速度")
+    first_receive_delay: float = Field(description="首包接收延迟(s)")
+    query_count: int = Field(description="音频数")
+    voice_count: int = Field(description="句子数")
+    pred_punctuation_num: int = Field(description="预测标点数")
+    label_punctuation_num: int = Field(description="标注标点数")
+    pred_sentence_punctuation_num: int = Field(description="预测句子标点数")
+    label_setence_punctuation_num: int = Field(description="标注句子标点数")
+    preds: List[StreamDataModel] = Field(description="预测结果")
+    label: QueryData = Field(description="标注结果")
+
+
+class ResultModel(BaseModel):
+    taskId: str
+    status: str
+    message: str = Field("")
+    recognition_results: Optional[StreamDataModel] = Field(None)
+
+    @field_validator("recognition_results", mode="after")
+    def convert_to_seconds(cls, v: Optional[StreamDataModel], values):
+        # 在这里处理除以1000的逻辑
+        if v is None:
+            return v
+        v.end_time = v.end_time / 1000
+        v.start_time = v.start_time / 1000
+        for word in v.words:
+            word.start_time /= 1000
+            word.end_time /= 1000
+        return v
+
+
+class ClientCallback:
+    def __init__(self, sut_url: str, port: int):
+        self.sut_url = sut_url     #sut_url：ASR 服务的 URL（如 http://asr-service:8080）
+        self.port = port           #port：当前客户端监听的端口（用于接收回调）
+
+        #创建 Flask 应用并注册路由
+        self.app = Flask(__name__)
+        self.app.add_url_rule(
+            "/api/asr/batch-callback/<taskId>",
+            view_func=self.asr_callback,
+            methods=["POST"],
+        )
+        self.app.add_url_rule(
+            "/api/asr-runner/report",
+            view_func=self.heartbeat,
+            methods=["POST"],
+        )
+        """
+        路由 1：/api/asr/batch-callback/<taskId>
+            接收 ASR 服务的识别结果回调（self.asr_callback 处理）。
+            taskId 是路径参数，用于标识具体任务。
+        路由 2：/api/asr-runner/report
+            接收 ASR 服务的心跳检测请求（self.heartbeat 处理）。
+        """
+
+        logging.getLogger("werkzeug").disabled = True
+        threading.Thread(
+            target=self.app.run, args=("0.0.0.0", port), daemon=True
+        ).start()
+        self.mutex = threading.Lock()
+        self.finished = threading.Event()
+        self.product_avaiable = True
+
+        self.reset()
+
+    def reset(self):
+        self.begin_time = None
+        self.end_time = None
+        self.first_receive_time = None
+        self.last_heartbeat_time = None
+        self.app_on = False
+        self.para_seq = 0
+        self.finished.clear()
+        self.error: Optional[str] = None
+        self.last_recognition_result: Optional[StreamDataModel] = None
+        self.recognition_results: List[StreamDataModel] = []
+
+    def asr_callback(self, taskId: str):
+        if self.app_on is False:
+            abort(400)
+        body = request.get_json(silent=True)   # 静默解析JSON，失败时返回None
+        if body is None:
+            abort(404)
+        try:
+            result = ResultModel.model_validate(body) #将回调的 JSON 数据解析为 ResultModel 对象，确保结构符合预期。
+        except ValidationError as e:
+            log.error("asr_callback: 结果格式错误: %s", e)
+            abort(404)
+
+        #处理任务完成状态（FINISHED）
+        if result.status == "FINISHED":
+            with self.mutex:
+                self.stop()
+            return "ok"
+        #处理非运行状态（非 RUNNING）
+        if result.status != "RUNNING":
+            log.error(
+                "asr_callback: 结果状态错误: %s, message: %s",
+                result.status,
+                result.message,
+            )
+            abort(404)
+
+        recognition_result = result.recognition_results
+        if recognition_result is None:
+            log.error("asr_callback: 结果中没有recognition_results字段")
+            abort(404)
+
+        with self.mutex:    
+            if not self.app_on:
+                log.error("asr_callback: 应用已结束")
+                abort(400)
+
+            if recognition_result.para_seq < self.para_seq:
+                error = "asr_callback: 结果中para_seq小于上一次的: %d < %d" % (
+                    recognition_result.para_seq,
+                    self.para_seq,
+                )
+                log.error(error)
+                if self.error is None:
+                    self.error = error
+                    self.stop()
+                abort(404)
+            if recognition_result.para_seq > self.para_seq + 1:
+                error = (
+                    "asr_callback: 结果中para_seq大于上一次的+1 \
+说明存在para_seq = %d没有final_result为True确认"
+                    % (self.para_seq + 1,)
+                )
+                log.error(error)
+                if self.error is None:
+                    self.error = error
+                    self.stop()
+                abort(404)
+            if (
+                self.last_recognition_result is not None
+                and recognition_result.start_time
+                < self.last_recognition_result.end_time
+            ):
+                error = "asr_callback: 结果中start_time小于上一次的end_time: %s < %s" % (
+                    recognition_result.start_time,
+                    self.last_recognition_result.end_time,
+                )
+                log.error(error)
+                if self.error is None:
+                    self.error = error
+                    self.stop()
+                abort(404)
+
+            self.recognition_results.append(recognition_result)
+            if recognition_result.final_result is True:
+                self.para_seq = recognition_result.para_seq
+                if self.last_recognition_result is None:
+                    self.first_receive_time = time.time()
+                self.last_recognition_result = recognition_result
+
+        return "ok"
+
+    """
+    def heartbeat(self):
+        if self.app_on is False:
+            abort(400)
+        body = request.get_json(silent=True)
+        if body is None:
+            abort(404)
+        status = body.get("status")
+        if status != "RUNNING":
+            message = body.get("message", "")
+            if message:
+                message = ", message: " + message
+            log.error("heartbeat: 状态错误: %s%s", status, message)
+            return "ok"
+
+        with self.mutex:
+            self.last_heartbeat_time = time.time()
+        return "ok"
+        
+    """
+    
+    def predict(
+        self,
+        language: Optional[str],
+        audio_file: str,
+        audio_duration: float,
+        task_id: str,
+    ):
+        #使用互斥锁确保线程安全
+        with self.mutex:
+            if self.app_on:
+                log.error("上一音频尚未完成处理，流程出现异常")
+                raise StopException()
+            self.reset()
+            self.app_on = True
+
+        #请求URL：self.sut_url + "/predict"（如 http://localhost:8080/predict）
+        resp = requests.post(
+            self.sut_url + "/predict",
+            data={
+                "language": language,
+                "taskId": task_id,
+                "progressCallbackUrl": "http://%s:%d/api/asr/batch-callback/%s"
+                % (MY_POD_IP, self.port, task_id),
+                "heartbeatUrl": "http://%s:%d/api/asr-runner/report" % (MY_POD_IP, self.port),
+            },
+            files={"file": (audio_file, open(audio_file, "rb"))},
+            timeout=60,
+        )
+
+        #响应处理
+        if resp.status_code != 200:
+            log.error("/predict接口返回http code %s", resp.status_code)
+            raise StopException()
+        resp.raise_for_status()
+
+        status = resp.json().get("status")
+        if status != "OK":
+            log.error("/predict接口返回非OK状态: %s", status)
+            raise StopException()
+        #辅助线程
+        threading.Thread(
+            target=self.dead_line_check, args=(audio_duration,), daemon=True
+        ).start()
+        threading.Thread(target=self.heartbeat_check, daemon=True).start()
+
+    def dead_line_check(self, audio_duration: float):
+        begin_time = time.time()
+        self.begin_time = begin_time
+
+        # 初始化 10s 延迟检测
+        self.sleep_to(begin_time + 10)
+        with self.mutex:
+            if self.last_recognition_result is None:
+                error = "首包延迟内未收到返回"
+                log.error(error)
+                if self.error is None:
+                    self.error = error
+                    self.stop()
+                return
+
+        # 第一次30s检测
+        next_checktime = begin_time + 30
+        ddl = begin_time + max((audio_duration / 6) + 10, 30)
+        while time.time() < ddl:
+            self.sleep_to(next_checktime)
+            with self.mutex:
+                if self.finished.is_set():
+                    return
+                if self.last_recognition_result is None:
+                    error = "检测追赶线过程中获取最后一次识别结果异常"
+                    log.error(error)
+                    if self.error is None:
+                        self.error = error
+                        self.stop()
+                        return
+                last_end_time = self.last_recognition_result.end_time
+            expect_end_time = (next_checktime - begin_time - 30) * 5.4
+            if last_end_time < expect_end_time:
+                log.warning(
+                    "识别时间位置 %s 被死亡追赶线 %s 已追上，将置为产品不可用",
+                    last_end_time,
+                    expect_end_time,
+                )
+                self.product_avaiable = False
+                self.sleep_to(ddl)
+                break
+            next_checktime = last_end_time / 5.4 + begin_time + 30 + 1
+            next_checktime = min(next_checktime, ddl)
+        with self.mutex:
+            if self.finished.is_set():
+                return
+
+        log.warning("识别速度rtf低于1/6, 将置为产品不可用")
+        self.product_avaiable = False
+        self.sleep_to(begin_time + max((audio_duration / 3) + 10, 30))
+        with self.mutex:
+            if self.finished.is_set():
+                return
+            error = "处理时间超过ddl %s " % (ddl - begin_time)
+            log.error(error)
+            if self.error is None:
+                self.error = error
+                self.stop()
+                return
+
+    def heartbeat_check(self):
+        self.last_heartbeat_time = time.time()
+        while True:
+            with self.mutex:
+                if self.finished.is_set():
+                    return
+                if time.time() - self.last_heartbeat_time > 30:
+                    error = "asr_runner 心跳超时 %s" % (
+                        time.time() - self.last_heartbeat_time
+                    )
+                    log.error(error)
+                    if self.error is None:
+                        self.error = error
+                        self.stop()
+                        return
+            time.sleep(5)
+
+    def sleep_to(self, to: float):
+        seconds = to - time.time()
+        if seconds <= 0:
+            return
+        time.sleep(seconds)
+
+    def stop(self):
+        self.end_time = time.time()
+        self.finished.set()
+        self.app_on = False
+
+    def evaluate(self, query_data: QueryData):
+        log.info("开始评估")
+        if (
+            self.begin_time is None
+            or self.end_time is None
+            or self.first_receive_time is None
+        ):
+            if self.begin_time is None:
+                log.error("评估流程异常 无开始时间")
+            if self.end_time is None:
+                log.error("评估流程异常 无结束时间")
+            if self.first_receive_time is None:
+                log.error("评估流程异常 无首次接收时间")
+            raise StopException()
+        rtf = max(self.end_time - self.begin_time - 10, 0) / query_data.duration
+        first_receive_delay = max(self.first_receive_time - self.begin_time, 0)
+        query_count = 1
+        voice_count = len(query_data.voice)
+        preds = self.recognition_results
+        self.recognition_results = list(
+            filter(lambda x: x.final_result, self.recognition_results)
+        )
+        (
+            pred_punctuation_num,
+            label_punctuation_num,
+            pred_sentence_punctuation_num,
+            label_setence_punctuation_num,
+        ) = evaluate_punctuation(query_data, self.recognition_results)
+
+        (
+            cer,
+            _,
+            align_start,
+            align_end,
+            first_word_distance_sum,
+            last_word_distance_sum,
+        ) = evaluate_editops(query_data, self.recognition_results)
+
+        if align_start[300] / voice_count < 0.8:
+            log.warning(
+                "评估结果首字300ms对齐率 %s < 0.8, 将置为产品不可用",
+                align_start[300] / voice_count,
+            )
+            self.product_avaiable = False
+
+        return EvaluateResult(
+            lang=query_data.lang,
+            cer=cer,
+            align_start=align_start,
+            align_end=align_end,
+            first_word_distance_sum=first_word_distance_sum,
+            last_word_distance_sum=last_word_distance_sum,
+            rtf=rtf,
+            first_receive_delay=first_receive_delay,
+            query_count=query_count,
+            voice_count=voice_count,
+            pred_punctuation_num=pred_punctuation_num,
+            label_punctuation_num=label_punctuation_num,
+            pred_sentence_punctuation_num=pred_sentence_punctuation_num,
+            label_setence_punctuation_num=label_setence_punctuation_num,
+            preds=preds,
+            label=query_data,
+        )
--- a/utils/evaluate.py
+++ b/utils/evaluate.py
@@ -0,0 +1,445 @@
+import os
+import subprocess
+from collections import defaultdict
+from typing import Dict, List
+
+from utils import asr_ter
+from utils.logger import logger
+
+log_mid_result = int(os.getenv("log", 0)) == 1
+
+
+class AsrEvaluator:
+    def __init__(self) -> None:
+        self.query_count = 0  # query 数目（语音数目）
+        self.voice_count = 0  # 有开始和结束时间的语音条数（用于 RTF 计算）
+        self.cut_punc = []  # 切分标点符号，需要注意切分的时候根据列表中的顺序进行切分，比如 ... 应该放到 . 之前。
+        # cer 属性
+        self.one_minus_cer = 0  # 每个 query 的 1 - cer 和
+        self.token_count = 0  # 每个 query 的字数/词数和
+        # 句子切分率属性
+        self.miss_count = 0  # 每个 query miss-count 和
+        self.more_count = 0  # 每个 query more-count 和
+        self.cut_count = 0  # 每个 query cut-count 和
+        self.rate = 0  # 每个 query 的 cut-rate 和
+        # detail case
+        self.result = []
+
+    def evaluate(self, eval_result):
+        pass
+
+    def post_evaluate(self):
+        pass
+
+    def gen_result(self) -> Dict:
+        output_result = dict()
+        output_result["query_count"] = self.query_count
+        output_result["voice_count"] = self.voice_count
+        output_result["token_cnt"] = self.token_count
+        output_result["one_minus_cer"] = self.one_minus_cer
+        output_result["one_minus_cer_metrics"] = self.one_minus_cer / self.query_count
+        output_result["miss_count"] = self.miss_count
+        output_result["more_count"] = self.more_count
+        output_result["cut_count"] = self.cut_count
+        output_result["cut_rate"] = self.rate
+        output_result["cut_rate_metrics"] = self.rate / self.query_count
+        output_result["rtf"] = self.rtf
+        output_result["rtf_end"] = self.rtf_end
+        output_result["rtf_metrics"] = self.rtf / self.voice_count
+        output_result["rtf_end_metrics"] = self.rtf_end / self.voice_count
+
+        detail_case = self.result
+        return output_result, detail_case
+
+    def _get_predict_final_sentences(self, predict_data: List[Dict]) -> List[str]:
+        """
+        获取 predict data 数据，然后将其中 final 的句子拿出来，放到列表里。
+        """
+        return [
+            item["recoginition_results"]["text"]
+            for item in predict_data
+            if item["recoginition_results"]["final_result"]
+        ]
+
+    def _sentence_final_index(self, sentences: List[str], tokens: List[str], tokenizer="word") -> List[int]:
+        """
+        获取 sentence 结束的字对应的 token 索引值。
+        """
+        token_index_list = []
+        token_idx = 0
+        for sentence in sentences:
+            for token in Tokenizer.tokenize(sentence, tokenizer):
+                if token not in tokens:
+                    continue
+                while tokens[token_idx] != token:
+                    token_idx += 1
+            token_index_list.append(token_idx)
+        return token_index_list
+
+    def _voice_to_cut_sentence(self, voice_sentences: List[str]) -> Dict:
+        """
+        将数据集的语音片段转换为最小切分单元列表。
+        使用 cut_punc 中的所有 punc 进行依次切分，最后去除掉完全空的内容
+        示例：
+        ["你好，你好呀", "你好，我在写抽象的代码逻辑"]
+        ->
+        cut_sentences: ["你好", "你好呀", "你好", "我在写抽象的代码逻辑"]
+        cut_sentence_index_list: [1, 3] ("你好呀" 对应 1-idx, "我在写抽象的代码逻辑" 对应 3-idx)
+        """
+        voice_sentences_result = defaultdict(list)
+        for voice_sentence in voice_sentences:
+            sentence_list = [voice_sentence]
+            sentence_tmp_list = []
+            for punc in self.cut_punc:
+                for sentence in sentence_list:
+                    sentence_tmp_list.extend(sentence.split(punc))
+                sentence_list, sentence_tmp_list = sentence_tmp_list, []
+            sentence_list = [item for item in sentence_list if item]
+            # 切分后的句子单元
+            voice_sentences_result["cut_sentences"].extend(sentence_list)
+            # 每个语音单元最后一个字对应的句子单元的索引
+            voice_sentences_result["cut_sentence_index_list"].append(len(voice_sentences_result["cut_sentences"]) - 1)
+        return voice_sentences_result
+
+    def _voice_bytes_index(self, timestamp, sample_rate=16000, bit_depth=16, channels=1):
+        """
+        timestamp: 时间, 单位秒
+        """
+        bytes_per_sample = bit_depth // 8
+        return timestamp * sample_rate * bytes_per_sample * channels
+
+
+class AsrZhEvaluator(AsrEvaluator):
+    """
+    中文的评估方式
+    """
+
+    def __init__(self):
+        super().__init__()
+        self.cut_zh_punc = ["······", "......", "。", "，", "？", "！", "；", "："]
+        self.cut_en_punc = ["...", ".", ",", "?", "!", ";", ":"]
+        self.cut_punc = self.cut_zh_punc + self.cut_en_punc
+
+    def evaluate(self, eval_result) -> Dict:
+        self.query_count += 1
+        self.voice_count += len(eval_result["voice"])
+
+        # 获取，标注结果 & 语音单元（非句子单元）
+        label_voice_sentences = [item["answer"] for item in eval_result["voice"]]
+        # print("label_voice_sentences", label_voice_sentences)
+        # 获取，标注结果 & 语音单元 -> 句子单元的转换情况
+        voice_to_cut_info = self._voice_to_cut_sentence(label_voice_sentences)
+        # print("voice_to_cut_info", voice_to_cut_info)
+        # 获取，标注结果 & 句子单元
+        label_sentences = voice_to_cut_info["cut_sentences"]
+        # 获取，标注结果 & 语音单元 -> 句子单元的映射关系，每个语音单元最后一个字对应的句子单元的索引
+        cut_sentence_index_list = voice_to_cut_info["cut_sentence_index_list"]
+        # 标注结果 & 句子单元 & norm 操作
+        label_sentences = [self._sentence_norm(sentence) for sentence in label_sentences]
+        if log_mid_result:
+            logger.info(f"label_sentences {label_sentences}")
+        # print("label_sentences", label_sentences)
+
+        # 预测结果 & 句子单元
+        predict_sentences_raw = self._get_predict_final_sentences(eval_result["predict_data"])
+        # print("predict_sentences_raw", predict_sentences_raw)
+        # 预测结果 & 句子单元 & norm 操作
+        predict_sentences = [self._sentence_norm(sentence) for sentence in predict_sentences_raw]
+        if log_mid_result:
+            logger.info(f"predict_sentences {predict_sentences}")
+        # print("predict_sentences", predict_sentences)
+
+        # 基于最小编辑距离进行 token 匹配，获得匹配后的 token 列表
+        label_tokens, predict_tokens = self._sentence_transfer("".join(label_sentences), "".join(predict_sentences))
+
+        # cer 计算
+        cer_info = self.cer(label_sentences, predict_sentences)
+        if log_mid_result:
+            logger.info(f"cer_info {cer_info}")
+        # print("cer_info", cer_info)
+        self.one_minus_cer += cer_info["one_minus_cer"]
+        self.token_count += cer_info["token_count"]
+
+        # 句子切分准召率
+        cut_info = self.cut_rate(label_sentences, predict_sentences, label_tokens, predict_tokens)
+
+        if log_mid_result:
+            logger.info(f"{cut_info['miss_count']}, {cut_info['more_count']}, {cut_info['rate']}")
+        # print("cut_info", cut_info)
+        # print(cut_info["miss_count"], cut_info["more_count"], cut_info["rate"])
+        self.miss_count += cut_info["miss_count"]
+        self.more_count += cut_info["more_count"]
+        self.cut_count += cut_info["cut_count"]
+        self.rate += cut_info["rate"]
+
+        self.result.append(
+            {
+                "label_tokens": label_tokens,
+                "predict_tokens": predict_tokens,
+                "one_minus_cer": cer_info["one_minus_cer"],
+                "token_count": cer_info["one_minus_cer"],
+                "miss_count": cut_info["miss_count"],
+                "more_count": cut_info["more_count"],
+                "cut_count": cut_info["cut_count"],
+                "rate": cut_info["rate"],
+            }
+        )
+
+    def cer(self, label_sentences, predict_sentences):
+        pred_str = ''.join(predict_sentences) if predict_sentences is not None else ''
+        label_str = ''.join(label_sentences)
+        r = asr_ter.calc_ter_speechio(pred_str, label_str)
+        one_minus_cer = max(1.0 - r['ter'], 0)
+        token_count = r['ref_all_token_cnt']
+        return {"one_minus_cer": one_minus_cer, "token_count": token_count}
+
+    def cut_rate(self, label_sentences, predict_sentences, label_tokens, predict_tokens):
+        label_final_index_list = set(self._sentence_final_index(label_sentences, label_tokens))
+        pred_final_index_list = set(self._sentence_final_index(predict_sentences, predict_tokens))
+        label_sentence_count = len(label_final_index_list)
+        miss_count = len(label_final_index_list - pred_final_index_list)
+        more_count = len(pred_final_index_list - label_final_index_list)
+        rate = max(1 - (miss_count + more_count * 2) / label_sentence_count, 0)
+        return {
+            "miss_count": miss_count,
+            "more_count": more_count,
+            "cut_count": label_sentence_count,
+            "rate": rate,
+            "label_final_index_list": label_final_index_list,
+            "pred_final_index_list": pred_final_index_list,
+        }
+
+    def _sentence_norm(self, sentence, tokenizer="word"):
+        """
+        对句子进行 norm 操作
+        """
+        from utils.speechio import textnorm_zh as textnorm
+
+        if tokenizer == "word":
+            normalizer = textnorm.TextNorm(
+                to_banjiao=True,
+                to_upper=True,
+                to_lower=False,
+                remove_fillers=True,
+                remove_erhua=False,  # 这里同批量识别不同，改成了 False
+                check_chars=False,
+                remove_space=False,
+                cc_mode="",
+            )
+            return normalizer(sentence)
+        else:
+            logger.error("tokenizer error, not support.")
+
+    def _sentence_transfer(self, label_sentence: str, predict_sentence: str, tokenizer="char"):
+        """
+        基于最小编辑距离，将 label 和 predict 进行字的位置匹配，并生成转换后的结果
+        args:
+            label: "今天的通话质量不错呀昨天的呢"
+            predict: "今天的通话质量不错昨天呢星期"
+            tokenizer: 分词方式
+        return:
+            label:   ["今", "天", "的", "通", "话", "质", "量", "不", "错", "呀", "昨", "天", "的", "呢", None, None]
+            predict: ["今", "天", "的", "通", "话", "质", "量", "不", "错", None, "昨", "天", None, "呢", "星", "期"]
+        """
+        from utils.speechio import error_rate_zh as error_rate
+
+        if tokenizer == "char":
+            alignment, score = error_rate.EditDistance(
+                error_rate.tokenize_text(label_sentence, tokenizer),
+                error_rate.tokenize_text(predict_sentence, tokenizer),
+            )
+            label_tokens, pred_tokens = [], []
+            for align in alignment:
+                # print(align.__dict__)
+                label_tokens.append(align.ref)
+                pred_tokens.append(align.hyp)
+            return (label_tokens, pred_tokens)
+        else:
+            logger.error("tokenizer 出错了，暂时不支持其它的")
+
+    def _pred_data_transfer(self, predict_data, recv_time):
+        """
+        predict_data = [
+            {"recoginition_results": {"text": "1", "final_result": False, "para_seq": 0}},
+            {"recoginition_results": {"text": "12", "final_result": False, "para_seq": 0}},
+            {"recoginition_results": {"text": "123", "final_result": True, "para_seq": 0}},
+            {"recoginition_results": {"text": "4", "final_result": False, "para_seq": 0}},
+            {"recoginition_results": {"text": "45", "final_result": False, "para_seq": 0}},
+            {"recoginition_results": {"text": "456", "final_result": True, "para_seq": 0}},
+        ]
+        recv_time = [1, 3, 5, 6, 7, 8]
+
+        ->
+
+        [
+            [{'text': '1', 'time': 1}, {'text': '12', 'time': 3}, {'text': '123', 'time': 5}],
+            [{'text': '4', 'time': 6}, {'text': '45', 'time': 7}, {'text': '456', 'time': 8}],
+        ]
+        """
+        pred_sentence_info = []
+        pred_sentence_index = 0
+        for predict_item, recv_time_item in zip(predict_data, recv_time):
+            if len(pred_sentence_info) == pred_sentence_index:
+                pred_sentence_info.append([])
+            pred_sentence_info[pred_sentence_index].append(
+                {
+                    "text": predict_item["recoginition_results"]["text"],
+                    "time": recv_time_item,
+                }
+            )
+            if predict_item["recoginition_results"]["final_result"]:
+                pred_sentence_index += 1
+        return pred_sentence_info
+
+
+class AsrEnEvaluator(AsrEvaluator):
+    """
+    英文的评估方式
+    """
+
+    def evaluate(self, eval_result) -> Dict:
+        self.query_count += 1
+        self.voice_count += len(eval_result["voice"])
+
+        # 获取，标注结果 & 语音单元（非句子单元）
+        label_voice_sentences = [item["answer"] for item in eval_result["voice"]]
+        # print("label_voice_sentences", label_voice_sentences)
+        # 获取，标注结果 & 语音单元 -> 句子单元的转换情况
+        voice_to_cut_info = self._voice_to_cut_sentence(label_voice_sentences)
+        # print("voice_to_cut_info", voice_to_cut_info)
+        # 获取，标注结果 & 句子单元
+        label_sentences = voice_to_cut_info["cut_sentences"]
+        # 获取，标注结果 & 语音单元 -> 句子单元的映射关系，每个语音单元最后一个字对应的句子单元的索引
+        cut_sentence_index_list = voice_to_cut_info["cut_sentence_index_list"]
+        # 标注结果 & 句子单元 & norm 操作
+        label_sentences = self._sentence_list_norm(label_sentences)
+        # [self._sentence_norm(sentence) for sentence in label_sentences]
+        # print("label_sentences", label_sentences)
+        if log_mid_result:
+            logger.info(f"label_sentences {label_sentences}")
+
+        # 预测结果 & 句子单元
+        predict_sentences_raw = self._get_predict_final_sentences(eval_result["predict_data"])
+        # print("predict_sentences_raw", predict_sentences_raw)
+        # 预测结果 & 句子单元 & norm 操作
+        predict_sentences = self._sentence_list_norm(predict_sentences_raw)
+        # [self._sentence_norm(sentence) for sentence in predict_sentences_raw]
+        # print("predict_sentences", predict_sentences)
+        if log_mid_result:
+            logger.info(f"predict_sentences {predict_sentences}")
+
+        label_tokens, predict_tokens = self._sentence_transfer(" ".join(label_sentences), " ".join(predict_sentences))
+        # print(label_tokens)
+        # print(predict_tokens)
+
+        # cer 计算
+        cer_info = self.cer(label_tokens, predict_tokens)
+        # print("cer_info", cer_info)
+        if log_mid_result:
+            logger.info(f"cer_info {cer_info}")
+        self.one_minus_cer += cer_info["one_minus_cer"]
+        self.token_count += cer_info["token_count"]
+
+        # 句子切分准召率
+        cut_info = self.cut_rate(label_sentences, predict_sentences, label_tokens, predict_tokens)
+        # print(cut_info["miss_count"], cut_info["more_count"], cut_info["rate"])
+        # print("cut_info", cut_info)
+        if log_mid_result:
+            logger.info(f"{cut_info['miss_count']}, {cut_info['more_count']}, {cut_info['rate']}")
+        self.miss_count += cut_info["miss_count"]
+        self.more_count += cut_info["more_count"]
+        self.cut_count += cut_info["cut_count"]
+        self.rate += cut_info["rate"]
+
+        self.result.append(
+            {
+                "label_tokens": label_tokens,
+                "predict_tokens": predict_tokens,
+                "one_minus_cer": cer_info["one_minus_cer"],
+                "token_count": cer_info["one_minus_cer"],
+                "miss_count": cut_info["miss_count"],
+                "more_count": cut_info["more_count"],
+                "cut_count": cut_info["cut_count"],
+                "rate": cut_info["rate"],
+            }
+        )
+
+    def cer(self, label_tokens, predict_tokens):
+        s, d, i, c = 0, 0, 0, 0
+        for label_token, predict_token in zip(label_tokens, predict_tokens):
+            if label_token == predict_token:
+                c += 1
+            elif predict_token is None:
+                d += 1
+            elif label_token is None:
+                i += 1
+            else:
+                s += 1
+        cer = (s + d + i) / (s + d + c)
+        one_minus_cer = max(1.0 - cer, 0)
+        token_count = s + d + c
+        return {"one_minus_cer": one_minus_cer, "token_count": token_count}
+
+    def cut_rate(self, label_sentences, predict_sentences, label_tokens, predict_tokens):
+        label_final_index_list = set(self._sentence_final_index(label_sentences, label_tokens, "whitespace"))
+        pred_final_index_list = set(self._sentence_final_index(predict_sentences, predict_tokens, "whitespace"))
+        label_sentence_count = len(label_final_index_list)
+        miss_count = len(label_final_index_list - pred_final_index_list)
+        more_count = len(pred_final_index_list - label_final_index_list)
+        rate = max(1 - (miss_count + more_count * 2) / label_sentence_count, 0)
+        return {
+            "miss_count": miss_count,
+            "more_count": more_count,
+            "cut_count": label_sentence_count,
+            "rate": rate,
+            "label_final_index_list": label_final_index_list,
+            "pred_final_index_list": pred_final_index_list,
+        }
+
+    def _sentence_list_norm(self, sentence_list, tokenizer="whitespace"):
+        pwd = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+        with open('./predict.txt', 'w', encoding='utf-8') as fp:
+            for idx, sentence in enumerate(sentence_list):
+                fp.write('%s\t%s\n' % (idx, sentence))
+        subprocess.run(
+            f'PYTHONPATH={pwd}/utils/speechio python {pwd}/utils/speechio/textnorm_en.py --has_key --to_upper ./predict.txt ./predict_norm.txt',
+            shell=True,
+            check=True,
+        )
+        sentence_norm = []
+        with open('./predict_norm.txt', 'r', encoding='utf-8') as fp:
+            for line in fp.readlines():
+                line_split_result = line.strip().split('\t', 1)
+                if len(line_split_result) >= 2:
+                    sentence_norm.append(line_split_result[1])
+                # 有可能没有 norm 后就没了
+        return sentence_norm
+
+    def _sentence_transfer(self, label_sentence: str, predict_sentence: str, tokenizer="whitespace"):
+        """
+        基于最小编辑距离，将 label 和 predict 进行字的位置匹配，并生成转换后的结果
+        args:
+            label: "HELLO WORLD ARE U OK YEP"
+            predict: "HELLO WORLD U ARE U OK YEP"
+            tokenizer: 分词方式
+        return:
+            label:   ["HELLO", "WORLD", None, "ARE", "U", "OK", "YEP"]
+            predict: ["HELLO", "WORLD", "U",  "ARE", "U", "OK", "YEP"]
+        """
+        from utils.speechio import error_rate_zh as error_rate
+
+        if tokenizer == "whitespace":
+            alignment, score = error_rate.EditDistance(
+                error_rate.tokenize_text(label_sentence, tokenizer),
+                error_rate.tokenize_text(predict_sentence, tokenizer),
+            )
+            label_tokens, pred_tokens = [], []
+            for align in alignment:
+                label_tokens.append(align.ref)
+                pred_tokens.append(align.hyp)
+            return (label_tokens, pred_tokens)
+        else:
+            logger.error("tokenizer 出错了，暂时不支持其它的")
+
+    def post_evaluate(self) -> Dict:
+        pass
--- a/utils/evaluator.py
+++ b/utils/evaluator.py
@@ -0,0 +1,195 @@
+# coding: utf-8
+
+import os
+from collections import Counter, defaultdict
+from itertools import chain
+from typing import List
+
+from schemas.context import ASRContext
+from utils.logger import logger
+from utils.metrics import cer, cut_rate, cut_sentence, first_delay
+from utils.metrics import mean_on_counter, patch_unique_token_count
+from utils.metrics import revision_delay, text_align, token_mapping
+from utils.metrics import var_on_counter
+from utils.tokenizer import TOKENIZER_MAPPING, Tokenizer
+from utils.update_submit import change_product_available
+
+IN_TEST = os.getenv("SUBMIT_CONFIG_FILEPATH", 1) is None
+
+
+class BaseEvaluator:
+    def __init__(self) -> None:
+        self.query_count = 0  # query 数目（语音数目）
+        self.voice_count = 0
+        self.fail_count = 0  # 失败数目
+        # 首字延迟
+        self.first_delay_sum = 0
+        self.first_delay_cnt = 0
+        # 修正延迟
+        self.revision_delay_sum = 0
+        self.revision_delay_cnt = 0
+        # patch token 信息
+        self.patch_unique_cnt_counter = Counter()
+        # text align count
+        self.start_time_align_count = 0
+        self.end_time_align_count = 0
+        self.start_end_count = 0
+        # 1-cer
+        self.one_minus_cer = 0
+        self.token_count = 0
+        # 1-cer language
+        self.one_minus_cer_lang = defaultdict(int)
+        self.query_count_lang = defaultdict(int)
+        # sentence-cut
+        self.miss_count = 0
+        self.more_count = 0
+        self.sentence_count = 0
+        self.cut_rate = 0
+        # detail-case
+        self.context = ASRContext()
+        # 时延
+        self.send_interval = []
+        self.last_recv_interval = []
+        # 字含量不达标数
+        self.fail_char_contains_rate_num = 0
+        # 标点符号
+        self.punctuation_num = 0
+        self.pred_punctuation_num = 0
+
+    def evaluate(self, context: ASRContext):
+        self.query_count += 1
+        self.query_count_lang[context.lang] += 1
+
+        voice_count = len(context.labels)
+        self.voice_count += voice_count
+        
+        self.punctuation_num += context.punctuation_num
+        self.pred_punctuation_num += context.pred_punctuation_num
+
+        if not context.fail:
+            # 首字延迟
+            first_delay_sum, first_delay_cnt = first_delay(context)
+            self.first_delay_sum += first_delay_sum
+            self.first_delay_cnt += first_delay_cnt
+
+            # 修正延迟
+            revision_delay_sum, revision_delay_cnt = revision_delay(context)
+            self.revision_delay_sum += revision_delay_sum
+            self.revision_delay_cnt += revision_delay_cnt
+
+            # patch token 信息
+            counter = patch_unique_token_count(context)
+            self.patch_unique_cnt_counter += counter
+        else:
+            self.fail_count += 1
+
+        self.fail_char_contains_rate_num += context.fail_char_contains_rate_num
+
+        # text align count
+        start_time_align_count, end_time_align_count, start_end_count = text_align(context)
+        self.start_time_align_count += start_time_align_count
+        self.end_time_align_count += end_time_align_count
+        self.start_end_count += start_end_count
+
+        # cer, wer
+        sentences_gt: List[str] = [item.answer for item in context.labels]
+        sentences_dt: List[str] = [
+            item.recognition_results.text for item in context.preds if item.recognition_results.final_result
+        ]
+        if IN_TEST:
+            print(sentences_gt)
+            print(sentences_dt)
+
+        sentences_gt: List[str] = cut_sentence(sentences_gt, TOKENIZER_MAPPING.get(context.lang))
+        sentences_dt: List[str] = cut_sentence(sentences_dt, TOKENIZER_MAPPING.get(context.lang))
+        if IN_TEST:
+            print(sentences_gt)
+            print(sentences_dt)
+
+        # norm & tokenize
+        tokens_gt: List[List[str]] = Tokenizer.norm_and_tokenize(sentences_gt, context.lang)
+        tokens_dt: List[List[str]] = Tokenizer.norm_and_tokenize(sentences_dt, context.lang)
+        if IN_TEST:
+            print(tokens_gt)
+            print(tokens_dt)
+
+        # cer
+        tokens_gt_mapping, tokens_dt_mapping = token_mapping(list(chain(*tokens_gt)), list(chain(*tokens_dt)))
+        one_minue_cer, token_count = cer(tokens_gt_mapping, tokens_dt_mapping)
+        self.one_minus_cer += one_minue_cer
+        self.token_count += token_count
+        self.one_minus_cer_lang[context.lang] += one_minue_cer
+
+        # cut-rate
+        rate, sentence_cnt, miss_cnt, more_cnt = cut_rate(tokens_gt, tokens_dt, tokens_gt_mapping, tokens_dt_mapping)
+        self.cut_rate += rate
+        self.sentence_count += sentence_cnt
+        self.miss_count += miss_cnt
+        self.more_count += more_cnt
+
+        # detail-case
+        self.context = context
+
+        # 时延
+        if self.context.send_time_start_end and self.context.recv_time_start_end:
+            send_interval = self.context.send_time_start_end[1] - self.context.send_time_start_end[0]
+            recv_interval = self.context.recv_time_start_end[1] - self.context.send_time_start_end[0]
+            self.send_interval.append(send_interval)
+            self.last_recv_interval.append(recv_interval)
+            logger.info(
+                f"""第一次发送时间{self.context.send_time_start_end[0]}, \
+                    最后一次发送时间{self.context.send_time_start_end[-1]}, \
+                    发送间隔 {send_interval},
+                    最后一次接收时间{self.context.recv_time_start_end[-1]}, \
+                    接收间隔 {recv_interval}
+                    """
+            )
+
+    def post_evaluate(self):
+        pass
+
+    def gen_result(self):
+        result = {
+            "query_count": self.query_count,
+            "voice_count": self.voice_count,
+            "pred_voice_count": self.first_delay_cnt,
+            "first_delay_mean": self.first_delay_sum / self.first_delay_cnt if self.first_delay_cnt > 0 else 10,
+            "revision_delay_mean": (
+                self.revision_delay_sum / self.revision_delay_cnt if self.revision_delay_cnt > 0 else 10
+            ),
+            "patch_token_mean": mean_on_counter(self.patch_unique_cnt_counter),
+            "patch_token_var": var_on_counter(self.patch_unique_cnt_counter),
+            "start_time_align_count": self.start_time_align_count,
+            "end_time_align_count": self.end_time_align_count,
+            "start_time_align_rate": self.start_time_align_count / self.sentence_count,
+            "end_time_align_rate": self.end_time_align_count / self.sentence_count,
+            "start_end_count": self.start_end_count,
+            "one_minus_cer": self.one_minus_cer / self.query_count,
+            "token_count": self.token_count,
+            "miss_count": self.miss_count,
+            "more_count": self.more_count,
+            "sentence_count": self.sentence_count,
+            "cut_rate": self.cut_rate / self.query_count,
+            "fail_count": self.fail_count,
+            "send_interval": self.send_interval,
+            "last_recv_interval": self.last_recv_interval,
+            "fail_char_contains_rate_num": self.fail_char_contains_rate_num,
+            "punctuation_rate": self.pred_punctuation_num / self.punctuation_num,
+        }
+        for lang in self.one_minus_cer_lang:
+            result["one_minus_cer_" + lang] = \
+                self.one_minus_cer_lang[lang] / self.query_count_lang[lang]
+
+        if (
+            result["first_delay_mean"]
+            > float(os.getenv("FIRST_DELAY_THRESHOLD", "5"))
+            or
+            self.fail_char_contains_rate_num / self.voice_count > 0.1
+            # or
+            # result["punctuation_rate"] < 0.8
+        ):
+            change_product_available()
+        return result
+
+    def gen_detail_case(self):
+        return self.context
--- a/utils/evaluator_plus.py
+++ b/utils/evaluator_plus.py
@@ -0,0 +1,293 @@
+from collections import defaultdict
+from copy import deepcopy
+from itertools import chain
+from typing import Dict, List, Tuple
+
+import Levenshtein
+
+from schemas.dataset import QueryData
+from schemas.stream import StreamDataModel, StreamWordsModel
+from utils.metrics import Tokenizer
+from utils.metrics_plus import replace_general_punc
+from utils.tokenizer import TOKENIZER_MAPPING
+
+
+def evaluate_editops(
+    query_data: QueryData, recognition_results: List[StreamDataModel]
+) -> Tuple[float, int, Dict[int, int], Dict[int, int], float, float]:
+    """返回cer 句子总数 首字对齐情况 尾字对齐情况 首字时间差值和 尾字时间差值和
+    对齐情况为 时间差值->对齐数"""
+    recognition_results = deepcopy(recognition_results)
+    lang = query_data.lang
+    voices = query_data.voice
+    sentences_pred = [
+        recognition_result.text for recognition_result in recognition_results
+    ]
+    sentences_label = [item.answer for item in voices]
+
+    tokenizer_type = TOKENIZER_MAPPING[lang]
+    sentences_pred = replace_general_punc(sentences_pred, tokenizer_type)
+    sentences_label = replace_general_punc(sentences_label, tokenizer_type)
+
+    # norm & tokenize
+    tokens_pred = Tokenizer.norm_and_tokenize(sentences_pred, lang)
+    tokens_label = Tokenizer.norm_and_tokenize(sentences_label, lang)
+
+    normed_words = []
+    for recognition_result in recognition_results:
+        words = list(map(lambda x: x.text, recognition_result.words))
+        normed_words.extend(words)
+    normed_words = replace_general_punc(normed_words, tokenizer_type)
+    normed_words = Tokenizer.norm(normed_words, lang)
+
+    # 预测中的结果进行相同的norm和tokenize操作
+    normed_word_index = 0
+    for recognition_result in recognition_results:
+        next_index = normed_word_index + len(recognition_result.words)
+        tokens_words = Tokenizer.tokenize(
+            normed_words[normed_word_index:next_index], lang
+        )
+        normed_word_index = next_index
+        stream_words: List[StreamWordsModel] = []
+        # 将原words进行norm和tokenize操作后赋值为对应原word的时间
+        for raw_stream_word, tokens_word in zip(
+            recognition_result.words, tokens_words
+        ):
+            for word in tokens_word:
+                stream_words.append(
+                    StreamWordsModel(
+                        text=word,
+                        start_time=raw_stream_word.start_time,
+                        end_time=raw_stream_word.end_time,
+                    )
+                )
+        recognition_result.words = stream_words
+
+    # 将words对应上对分词后的词，从而使得分词后的词有时间
+    pred_word_time: List[StreamWordsModel] = []
+    for token_pred, recognition_result in zip(tokens_pred, recognition_results):
+        word_index = 0
+        for word in recognition_result.words:
+            try:
+                token_index = token_pred.index(word.text, word_index)
+                for i in range(word_index, token_index + 1):
+                    pred_word_time.append(
+                        StreamWordsModel(
+                            text=token_pred[i],
+                            start_time=word.start_time,
+                            end_time=word.end_time,
+                        )
+                    )
+                word_index = token_index + 1
+            except ValueError:
+                pass
+        if len(recognition_result.words) > 0:
+            word = recognition_result.words[-1]
+            start_time = word.start_time
+            end_time = word.end_time
+        else:
+            start_time = recognition_result.start_time
+            end_time = recognition_result.end_time
+        for i in range(word_index, len(token_pred)):
+            pred_word_time.append(
+                StreamWordsModel(
+                    text=token_pred[i],
+                    start_time=start_time,
+                    end_time=end_time,
+                )
+            )
+
+    # 记录label每句话的首字尾字对应分词后的位置
+    index = 0
+    label_firstword_index: List[int] = []
+    label_lastword_index: List[int] = []
+    for token_label in tokens_label:
+        label_firstword_index.append(index)
+        index += len(token_label)
+        label_lastword_index.append(index - 1)
+
+    # cer
+    flat_tokens_pred = list(chain(*tokens_pred))
+    flat_tokens_label = list(chain(*tokens_label))
+    ops = Levenshtein.editops(flat_tokens_pred, flat_tokens_label)
+    insert = len(list(filter(lambda x: x[0] == "insert", ops)))
+    delete = len(list(filter(lambda x: x[0] == "delete", ops)))
+    replace = len(list(filter(lambda x: x[0] == "replace", ops)))
+    cer = (insert + delete + replace) / len(flat_tokens_label)
+
+    # 计算每个token在编辑后的下标位置
+    pred_offset = [0] * (len(flat_tokens_pred) + 1)
+    label_offset = [0] * (len(flat_tokens_label) + 1)
+    for op in ops:
+        if op[0] == "insert":
+            pred_offset[op[1]] += 1
+        elif op[0] == "delete":
+            label_offset[op[2]] += 1
+    pred_indexs = [pred_offset[0]]
+    for i in range(1, len(flat_tokens_pred)):
+        pred_indexs.append(pred_indexs[i - 1] + pred_offset[i] + 1)
+    label_indexs = [label_offset[0]]
+    for i in range(1, len(flat_tokens_label)):
+        label_indexs.append(label_indexs[i - 1] + label_offset[i] + 1)
+
+    # 计算每个label中首字和尾字对应的时间
+    align_start = {100: 0, 200: 0, 300: 0, 500: 0}
+    align_end = {100: 0, 200: 0, 300: 0, 500: 0}
+    first_word_distance_sum = 0.0
+    last_word_distance_sum = 0.0
+    for firstword_index, lastword_index, voice in zip(
+        label_firstword_index, label_lastword_index, voices
+    ):
+        label_index = label_indexs[firstword_index]
+        label_in_pred_index = upper_bound(label_index, pred_indexs)
+        if label_in_pred_index != -1:
+            distance = abs(
+                voice.start - pred_word_time[label_in_pred_index].start_time
+            )
+            if label_in_pred_index > 0:
+                distance = min(
+                    distance,
+                    abs(
+                        voice.start
+                        - pred_word_time[label_in_pred_index - 1].start_time
+                    ),
+                )
+        else:
+            distance = abs(voice.start - pred_word_time[-1].start_time)
+        for limit in align_start.keys():
+            if distance <= limit / 1000:
+                align_start[limit] += 1
+        first_word_distance_sum += distance
+
+        label_index = label_indexs[lastword_index]
+        label_in_pred_index = lower_bound(label_index, pred_indexs)
+        if label_in_pred_index != -1:
+            distance = abs(
+                voice.end - pred_word_time[label_in_pred_index].end_time
+            )
+            if label_in_pred_index < len(pred_word_time) - 1:
+                distance = min(
+                    distance,
+                    abs(
+                        voice.end
+                        - pred_word_time[label_in_pred_index + 1].end_time
+                    ),
+                )
+        else:
+            distance = abs(voice.end - pred_word_time[0].end_time)
+        for limit in align_end.keys():
+            if distance <= limit / 1000:
+                align_end[limit] += 1
+        last_word_distance_sum += distance
+    return (
+        cer,
+        len(voices),
+        align_start,
+        align_end,
+        first_word_distance_sum,
+        last_word_distance_sum,
+    )
+
+
+def evaluate_punctuation(
+    query_data: QueryData, recognition_results: List[StreamDataModel]
+) -> Tuple[int, int, int, int]:
+    """评估标点符号指标 返回预测中标点数 label中标点数 预测中句子标点数 label中句子标点数"""
+    punctuation_mapping = defaultdict(lambda: [",", ".", "!", "?"])
+    punctuation_mapping.update(
+        {
+            "zh": ["，", "。", "！", "？"],
+            "ja": ["、", "。", "！", "？"],
+            "ar": ["،", ".", "!", "؟"],
+            "fa": ["،", ".", "!", "؟"],
+            "el": [",", ".", "！", "；"],
+            "ti": ["།"],
+            "th": [" ", ",", ".", "!", "?"],
+        }
+    )
+
+    punctuation_words: List[StreamWordsModel] = []
+    for recognition_result in recognition_results:
+        punctuations = punctuation_mapping[query_data.lang]
+        for word in recognition_result.words:
+            for char in word.text:
+                if char in punctuations:
+                    punctuation_words.append(word)
+                    break
+    punctuation_start_times = list(
+        map(lambda x: x.start_time, punctuation_words)
+    )
+    punctuation_start_times = sorted(punctuation_start_times)
+    punctuation_end_times = list(map(lambda x: x.end_time, punctuation_words))
+    punctuation_end_times = sorted(punctuation_end_times)
+
+    voices = query_data.voice
+    label_len = len(voices)
+    pred_punctuation_num = len(punctuation_words)
+    label_punctuation_num = 0
+    for label_voice in voices:
+        punctuations = punctuation_mapping[query_data.lang]
+        for char in label_voice.answer:
+            if char in punctuations:
+                label_punctuation_num += 1
+
+    pred_sentence_punctuation_num = 0
+    label_setence_punctuation_num = label_len
+    for i, label_voice in enumerate(voices):
+        if i < label_len - 1:
+            label_left = label_voice.end
+            label_right = voices[i + 1].start
+        else:
+            label_left = label_voice.end - 0.7
+            label_right = label_voice.end + 0.7
+
+        left_in_pred = upper_bound(label_left, punctuation_start_times)
+        exist = False
+        if (
+            left_in_pred != -1
+            and punctuation_start_times[left_in_pred] <= label_right
+        ):
+            exist = True
+        right_in_pred = lower_bound(label_right, punctuation_end_times)
+        if (
+            right_in_pred != -1
+            and punctuation_end_times[right_in_pred] >= label_left
+        ):
+            exist = True
+
+        if exist:
+            pred_sentence_punctuation_num += 1
+    return (
+        pred_punctuation_num,
+        label_punctuation_num,
+        pred_sentence_punctuation_num,
+        label_setence_punctuation_num,
+    )
+
+
+def upper_bound(x: float, lst: List[float]) -> int:
+    """第一个 >= x 的元素的下标 没有返回-1"""
+    ans = -1
+    left, right = 0, len(lst) - 1
+    while left <= right:
+        mid = (left + right) // 2
+        if lst[mid] >= x:
+            ans = mid
+            right = mid - 1
+        else:
+            left = mid + 1
+    return ans
+
+
+def lower_bound(x: float, lst: List[float]) -> int:
+    """最后一个 <= x 的元素的下标 没有返回-1"""
+    ans = -1
+    left, right = 0, len(lst) - 1
+    while left <= right:
+        mid = (left + right) // 2
+        if lst[mid] <= x:
+            ans = mid
+            left = mid + 1
+        else:
+            right = mid - 1
+    return ans
--- a/utils/file.py
+++ b/utils/file.py
@@ -0,0 +1,151 @@
+import json
+import os
+import shutil
+import tarfile
+import tempfile
+import zipfile
+from typing import Any
+
+import yaml
+
+
+def load_json(path: str, raise_for_invalid: bool = False) -> Any:
+    """读取path json文件转为对象"""
+    with open(path, "r", encoding="utf-8") as f:
+        if raise_for_invalid:
+
+            def parse_constant(s: str):
+                raise ValueError("非法json字符: %s" % s)
+
+            return json.load(f, parse_constant=parse_constant)
+        return json.load(f)
+
+
+def dump_json(path: str, obj: Any):
+    """将obj对象以json形式写入path文件"""
+    with open(path, "w", encoding="utf-8") as f:
+        json.dump(obj, f, ensure_ascii=False, indent=4)
+
+
+def load_yaml(path: str) -> Any:
+    """读取path yaml文件转为对象"""
+    with open(path, "r", encoding="utf-8") as f:
+        return yaml.full_load(f)
+
+
+def dump_yaml(path: str, obj: Any):
+    """将obj对象以yaml形式写入path文件"""
+    with open(path, "w", encoding="utf-8") as f:
+        yaml.dump(obj, f, indent=2, allow_unicode=True, sort_keys=False, line_break="\n")
+
+
+def dumps_yaml(obj: Any) -> str:
+    """将obj对象以yaml形式导出为字符串"""
+    return yaml.dump(obj, indent=2, allow_unicode=True, sort_keys=False, line_break="\n")
+
+
+def read_file(path: str) -> str:
+    """读取文件为str"""
+    with open(path, "r") as f:
+        return f.read()
+
+
+def write_bfile(path: str, data: bytes):
+    """将bytes data写入path文件"""
+    with open(path, "wb") as f:
+        f.write(data)
+
+
+def write_file(path: str, data: str):
+    """将str data写入path文件"""
+    with open(path, "w") as f:
+        f.write(data)
+
+
+def tail_file(path: str, tail: int) -> str:
+    """倍增获取文件path最后tail行"""
+    block = 1024
+    with open(path, "rb") as f:
+        f.seek(0, 2)
+        filesize = f.tell()
+        while True:
+            if filesize < block:
+                block = filesize
+            f.seek(filesize - block, 0)
+            lines = f.readlines()
+            if len(lines) > tail or filesize <= block:
+                return "".join(line.decode() for line in lines[-tail:])
+            block *= 2
+
+
+def zip_dir(zip_path: str, dirname: str):
+    """将dirname制作为zip_path压缩包"""
+    with zipfile.ZipFile(zip_path, "w") as ziper:
+        for path, _, files in os.walk(dirname):
+            for file in files:
+                ziper.write(
+                    os.path.join(path, file), os.path.join(path.removeprefix(dirname), file), zipfile.ZIP_DEFLATED
+                )
+
+
+def zip_files(name: str, zipfile_paths: list):
+    """将zipfiles_paths=list[文件名, 文件路径]制作为name压缩包"""
+    with zipfile.ZipFile(name, "w") as ziper:
+        for arcname, zipfile_path in zipfile_paths:
+            ziper.write(zipfile_path, arcname, zipfile.ZIP_DEFLATED)
+
+
+def zip_strs(name: str, zipfile_strs: list):
+    """将zipfile_strs=list[文件名, 内容]制作为name压缩包"""
+    with zipfile.ZipFile(name, "w") as ziper:
+        for filename, content in zipfile_strs:
+            ziper.writestr(filename, content)
+
+
+def zip_zipers(name: str, ziper_paths: list):
+    """将ziper_paths=list[压缩后名称, 压缩包/文件位置]制作为name压缩包"""
+    temp_dirname = tempfile.mkdtemp(prefix=name, dir=os.path.dirname(name))
+    os.makedirs(temp_dirname, exist_ok=True)
+    for subname, ziper_path in ziper_paths:
+        sub_dirname = os.path.join(temp_dirname, subname)
+        if not os.path.exists(ziper_path):
+            continue
+        if zipfile.is_zipfile(ziper_path):
+            # 压缩包解压
+            os.makedirs(sub_dirname, exist_ok=True)
+            unzip_dir(ziper_path, sub_dirname)
+        elif os.path.isfile(ziper_path):
+            # 文件
+            shutil.copyfile(ziper_path, sub_dirname)
+        else:
+            # 文件夹
+            shutil.copytree(ziper_path, sub_dirname)
+    zip_dir(name, temp_dirname)
+    shutil.rmtree(temp_dirname)
+
+
+def unzip_dir(zip_path: str, dirname: str, catch_exc: bool = True):
+    """将zip_path解压到dirname"""
+    with zipfile.ZipFile(zip_path, "r") as ziper:
+        try:
+            ziper.extractall(dirname)
+        except Exception as e:
+            if catch_exc:
+                write_file(os.path.join(dirname, "unzip_error.log"), "%r" % e)
+                shutil.copyfile(zip_path, os.path.join(dirname, os.path.basename(zip_path)))
+            else:
+                raise e
+
+
+def tar_dir(zip_path: str, dirname: str):
+    """将dirname压缩到zip_path"""
+    with tarfile.open(zip_path, "w:gz") as ziper:
+        for path, _, files in os.walk(dirname):
+            for file in files:
+                ziper.add(os.path.join(path, file), os.path.join(path.removeprefix(dirname), file))
+
+
+def untar_dir(zip_path: str, dirname: str):
+    """将zip_path解压到dirname"""
+    with tarfile.open(zip_path) as ziper:
+        ziper.extractall(dirname)
--- a/utils/helm.py
+++ b/utils/helm.py
@@ -0,0 +1,331 @@
+# -*- coding: utf-8 -*-
+import copy
+import io
+import json
+import os
+import re
+import tarfile
+import time
+from collections import defaultdict
+from typing import Any, Dict, Optional
+
+import requests
+from ruamel.yaml import YAML
+
+from utils.logger import logger
+
+sut_chart_root = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "helm-chart", "sut")
+headers = (
+    {'Authorization': 'Bearer ' + os.getenv("LEADERBOARD_API_TOKEN")} if os.getenv("LEADERBOARD_API_TOKEN") else None
+)
+pull_num: defaultdict = defaultdict()
+JOB_ID = int(os.getenv("JOB_ID", "-1"))
+LOAD_SUT_URL = os.getenv("LOAD_SUT_URL")
+GET_JOB_SUT_INFO_URL = os.getenv("GET_JOB_SUT_INFO_URL")
+
+
+def apply_env_to_values(values, envs):
+    if "env" not in values:
+        values["env"] = []
+    old_key_list = [x["name"] for x in values["env"]]
+    for k, v in envs.items():
+        try:
+            idx = old_key_list.index(k)
+            values["env"][idx]["value"] = v
+        except ValueError:
+            values["env"].append({"name": k, "value": v})
+    return values
+
+
+def merge_values(base_value, incr_value):
+    if isinstance(base_value, dict) and isinstance(incr_value, dict):
+        for k in incr_value:
+            base_value[k] = merge_values(base_value[k], incr_value[k]) if k in base_value else incr_value[k]
+    elif isinstance(base_value, list) and isinstance(incr_value, list):
+        base_value.extend(incr_value)
+    else:
+        base_value = incr_value
+    return base_value
+
+
+def gen_chart_tarball(docker_image):
+    """docker image加上digest并根据image生成helm chart包, 失败直接异常退出
+
+    Args:
+        docker_image (_type_): docker image
+
+    Returns:
+        tuple[BytesIO, dict]: [helm chart包file对象, values内容]
+    """
+    # load values template
+    with open(os.path.join(sut_chart_root, "values.yaml.tmpl")) as fp:
+        yaml = YAML(typ="rt")
+        values = yaml.load(fp)
+    # update docker_image
+    get_image_hash_url = os.getenv("GET_IMAGE_HASH_URL", None)
+    logger.info(f"get_image_hash_url: {get_image_hash_url}")
+    if get_image_hash_url is not None:
+        # convert tag to hash for docker_image
+        #docker_image = "harbor-contest.4pd.io/zhoushasha/speaker_identification:wo_model_v0"
+        docker_image = "harbor-contest.4pd.io/zhoushasha/image_classification:wo_model_v3"
+        resp = requests.get(get_image_hash_url, headers=headers, params={"image": docker_image}, timeout=600)
+
+        logger.info(f"resp.text: {resp.text}")
+        assert resp.status_code == 200, "Convert tag to hash for docker image failed, API retcode %d" % resp.status_code
+        resp = resp.json()
+        assert resp["success"], "Convert tag to hash for docker image failed, response: %s" % str(resp)
+        token = resp["data"]["image"].rsplit(":", 2)
+        assert len(token) == 3, "Invalid docker image %s" % resp["data"]["image"]
+        values["image"]["repository"] = token[0]
+        values["image"]["tag"] = ":".join(token[1:])
+    else:
+        token = docker_image.rsplit(":", 1)
+        if len(token) != 2:
+            raise RuntimeError("Invalid docker image %s" % docker_image)
+        values["image"]["repository"] = token[0]
+        values["image"]["tag"] = token[1]
+    # output values.yaml
+    with open(os.path.join(sut_chart_root, "values.yaml"), "w") as fp:
+        yaml = YAML(typ="rt")
+        yaml.dump(values, fp)
+    # tarball
+    tarfp = io.BytesIO()
+    with tarfile.open(fileobj=tarfp, mode="w:gz") as tar:
+        tar.add(sut_chart_root, arcname=os.path.basename(sut_chart_root), recursive=True)
+    tarfp.seek(0)
+    logger.debug(f"Generated chart using values: {values}")
+    return tarfp, values
+
+
+def deploy_chart(
+    name_suffix,
+    readiness_timeout,
+    chart_str=None,
+    chart_fileobj=None,
+    extra_values=None,
+    restart_count_limit=3,
+    pullimage_count_limit=3,
+):
+    """部署sut, 失败直接异常退出
+
+    Args:
+        name_suffix (str): 同一个job有多个sut时, 区分不同sut的名称
+        readiness_timeout (int): readiness超时时间, 单位s
+        chart_str (int, optional): chart url, 不为None则忽略chart_fileobj. Defaults to None.
+        chart_fileobj (BytesIO, optional): helm chart包file对象, chart_str不为None使用. Defaults to None.
+        extra_values (dict, optional): helm values的补充内容. Defaults to None.
+        restart_count_limit (int, optional): sut重启次数限制, 超出则异常退出. Defaults to 3.
+        pullimage_count_limit (int, optional): image拉取次数限制, 超出则异常退出. Defaults to 3.
+
+    Returns:
+        tuple[str, str]: [用于访问服务的k8s域名, 用于unload_sut的名称]
+    """
+    logger.info(f"Deploying SUT application for JOB {JOB_ID}, name_suffix {name_suffix}, extra_values {extra_values}")
+    # deploy
+    payload = {
+        "job_id": JOB_ID,
+        "resource_name": name_suffix,
+        "priorityclassname": os.environ.get("priorityclassname"),
+    }
+    extra_values = {} if not extra_values else extra_values
+    payload["values"] = json.dumps(extra_values, ensure_ascii=False)
+    if chart_str is not None:
+        payload["helm_chart"] = chart_str
+        resp = requests.post(LOAD_SUT_URL, data=payload, headers=headers, timeout=600)
+    else:
+        assert chart_fileobj is not None, "Either chart_str or chart_fileobj should be set"
+
+        logger.info(f"LOAD_SUT_URL: {LOAD_SUT_URL}")
+        logger.info(f"payload: {payload}")
+        logger.info(f"headers: {headers}")
+
+        resp = requests.post(
+            LOAD_SUT_URL,
+            data=payload,
+            headers=headers,
+            files=[("helm_chart_file", (name_suffix + ".tgz", chart_fileobj))],
+            timeout=600,
+        )
+        
+
+    if resp.status_code != 200:
+        raise RuntimeError("Failed to deploy application status_code %d %s" % (resp.status_code, resp.text))
+    resp = resp.json()
+    if not resp["success"]:
+        logger.error("Failed to deploy application response %r", resp)
+    service_name = resp["data"]["service_name"]
+    sut_name = resp["data"]["sut_name"]
+    logger.info(f"SUT application deployed with service_name {service_name}")
+    # waiting for appliation ready
+    running_at = None
+    retry_count = 0
+    while True:
+        retry_interval = 10
+        if retry_count % 20 == 19:
+            retry_count += 1
+            logger.info(f"Waiting {retry_interval} seconds to check whether SUT application {service_name} is ready...")
+            logger.info("20 retrys log this message again.")
+        time.sleep(retry_interval)
+        check_result, running_at = check_sut_ready_from_resp(
+            service_name,
+            running_at,
+            readiness_timeout,
+            restart_count_limit,
+            pullimage_count_limit,
+        )
+        if check_result:
+            break
+
+    logger.info(f"SUT application for JOB {JOB_ID} name_suffix {name_suffix} is ready, service_name {service_name}")
+    return service_name, sut_name
+
+
+def check_sut_ready_from_resp(
+    service_name,
+    running_at,
+    readiness_timeout,
+    restart_count_limit,
+    pullimage_count_limit,
+):
+    try:
+        resp = requests.get(
+            f"{GET_JOB_SUT_INFO_URL}/{JOB_ID}",
+            headers=headers,
+            params={"with_detail": True},
+            timeout=600,
+        )
+    except Exception as e:
+        logger.warning(f"Exception occured while getting SUT application {service_name} status", e)
+        return False, running_at
+    if resp.status_code != 200:
+        logger.warning(f"Get SUT application {service_name} status failed with status_code {resp.status_code}")
+        return False, running_at
+    resp = resp.json()
+    if not resp["success"]:
+        logger.warning(f"Get SUT application {service_name} status failed with response {resp}")
+        return False, running_at
+    if len(resp["data"]["sut"]) == 0:
+        logger.warning("Empty SUT application status")
+        return False, running_at
+    resp_data_sut = copy.deepcopy(resp["data"]["sut"])
+    for status in resp_data_sut:
+        del status["detail"]
+    logger.info(f"Got SUT application status: {resp_data_sut}")
+    for status in resp["data"]["sut"]:
+        if status["phase"] in ["Succeeded", "Failed"]:
+            raise RuntimeError(f"Some pods of SUT application {service_name} terminated with status {status}")
+        elif status["phase"] in ["Pending", "Unknown"]:
+            return False, running_at
+        elif status["phase"] != "Running":
+            raise RuntimeError(f"Unexcepted pod status {status} of SUT application {service_name}")
+        if running_at is None:
+            running_at = time.time()
+        for ct in status["detail"]["status"]["container_statuses"]:
+            if ct["restart_count"] > 0:
+                logger.info(f"pod {status['pod_name']} restart count = {ct['restart_count']}")
+            if ct["restart_count"] > restart_count_limit:
+                raise RuntimeError(f"pod {status['pod_name']} restart too many times(over {restart_count_limit})")
+            if (
+                ct["state"]["waiting"] is not None
+                and "reason" in ct["state"]["waiting"]
+                and ct["state"]["waiting"]["reason"] in ["ImagePullBackOff", "ErrImagePull"]
+            ):
+                pull_num[status["pod_name"]] += 1
+                logger.info(
+                    "pod %s has {pull_num[status['pod_name']]} times inspect pulling image info: %s"
+                    % (status["pod_name"], ct["state"]["waiting"])
+                )
+                if pull_num[status["pod_name"]] > pullimage_count_limit:
+                    raise RuntimeError(f"pod {status['pod_name']} cannot pull image")
+        if not status["conditions"]["Ready"]:
+            if running_at is not None and time.time() - running_at > readiness_timeout:
+                raise RuntimeError(f"SUT Application readiness has exceeded readiness_timeout:{readiness_timeout}s")
+            return False, running_at
+    return True, running_at
+
+
+def parse_resource(resource):
+    if resource == -1:
+        return -1
+    match = re.match(r"([\d\.]+)([mKMGTPENi]*)", resource)
+    value, unit = match.groups()
+    value = float(value)
+    unit_mapping = {
+        "": 1,
+        "m": 1e-3,
+        "K": 1e3,
+        "M": 1e6,
+        "G": 1e9,
+        "T": 1e12,
+        "P": 1e15,
+        "E": 1e18,
+        "Ki": 2**10,
+        "Mi": 2**20,
+        "Gi": 2**30,
+        "Ti": 2**40,
+        "Pi": 2**50,
+        "Ei": 2**60,
+    }
+    if unit not in unit_mapping:
+        raise ValueError(f"Unknown resources unit: {unit}")
+    return value * unit_mapping[unit]
+
+
+def limit_resources(resource):
+    if "limits" not in resource:
+        return resource
+    if "cpu" in resource["limits"]:
+        cpu_limit = parse_resource(resource["limits"]["cpu"])
+        if cpu_limit > 30:
+            logger.error("CPU limit exceeded. Adjusting to 30 cores.")
+            resource["limits"]["cpu"] = "30"
+    if "memory" in resource["limits"]:
+        memory_limit = parse_resource(resource["limits"]["memory"])
+        if memory_limit > 100 * 2**30:
+            logger.error("Memory limit exceeded, adjusting to 100Gi")
+            resource["limits"]["memory"] = "100Gi"
+
+
+def consistent_resources(resource):
+    if "limits" not in resource and "requests" not in resource:
+        return resource
+    elif "limits" in resource:
+        resource["requests"] = resource["limits"]
+    else:
+        resource["limits"] = resource["requests"]
+    return resource
+
+
+def resource_check(values: Dict[str, Any]):
+    resources = values.get("resources", {}).get("limits", {})
+    if "nvidia.com/gpu" in resources and int(resources["nvidia.com/gpu"]) > 0:
+        values["resources"]["limits"]["nvidia.com/gpumem"] = 8192
+        values["resources"]["limits"]["nvidia.com/gpucores"] = 10
+        values["resources"]["requests"] = values["resources"].get("requests", {})
+        if "cpu" not in values["resources"]["requests"] and "cpu" in values["resources"]["limits"]:
+            values["resources"]["requests"]["cpu"] = values["resources"]["limits"]["cpu"]
+        if "memory" not in values["resources"]["requests"] and "memory" in values["resources"]["limits"]:
+            values["resources"]["requests"]["memory"] = values["resources"]["limits"]["memory"]
+        values["resources"]["requests"]["nvidia.com/gpu"] = values["resources"]["limits"]["nvidia.com/gpu"]
+        values["resources"]["requests"]["nvidia.com/gpumem"] = 8192
+        values["resources"]["requests"]["nvidia.com/gpucores"] = 10
+
+        values["nodeSelector"] = values.get("nodeSelector", {})
+        if "contest.4pd.io/accelerator" not in values["nodeSelector"]:
+            values["nodeSelector"]["contest.4pd.io/accelerator"] = "A100-SXM4-80GBvgpu"
+        gpu_type = values["nodeSelector"]["contest.4pd.io/accelerator"]
+        gpu_num = resources["nvidia.com/gpu"]
+        if gpu_type != "A100-SXM4-80GBvgpu":
+            raise RuntimeError("GPU类型只能为A100-SXM4-80GBvgpu")
+        if gpu_num != 1:
+            raise RuntimeError("GPU个数只能为1")
+        values["tolerations"] = values.get("tolerations", [])
+        values["tolerations"].append(
+            {
+                "key": "hosttype",
+                "operator": "Equal",
+                "value": "vgpu",
+                "effect": "NoSchedule",
+            }
+        )
+    return values
--- a/utils/leaderboard.py
+++ b/utils/leaderboard.py
@@ -0,0 +1,38 @@
+from utils.request import requests_retry_session
+import os
+import json
+import traceback
+from utils.logger import logger
+
+lb_headers = {"Content-Type":"application/json"}
+if os.getenv("LEADERBOARD_API_TOKEN"):
+    lb_headers['Authorization'] = 'Bearer ' + os.getenv("LEADERBOARD_API_TOKEN")
+
+
+def change_product_unavailable() -> None:
+    logger.info("更改为产品不可用...")
+    submit_id = str(os.getenv("SUBMIT_ID", -1))
+    try:
+        requests_retry_session().post(
+            os.getenv("UPDATE_SUBMIT_URL", "http://contest.4pd.io:8080/submit/update"),
+            data=json.dumps({submit_id: {"product_avaliable": 0}}),
+            headers=lb_headers,
+        )
+    except Exception as e:
+        logger.error(traceback.format_exc())
+        logger.error(f"change product avaliable error, {e}")
+
+
+def mark_evaluating(task_id) -> None:
+    logger.info("上报EVALUATING状态...")
+    job_id = os.getenv('JOB_ID') or "-1"
+    url = os.getenv("REGISTER_MARK_TASK_URL", "http://contest.4pd.io:8080/job/register_mark_task") + "/" + job_id
+    try:
+        requests_retry_session().post(
+            url,
+            data=json.dumps({"task_id": task_id}),
+            headers=lb_headers,
+        )
+    except Exception as e:
+        logger.error(traceback.format_exc())
+        logger.error(f"mark evaluating error, {e}")
--- a/utils/logger.py
+++ b/utils/logger.py
@@ -0,0 +1,30 @@
+# -*- coding: utf-8 -*-
+import logging
+import os
+
+logging.basicConfig(
+    format="%(asctime)s %(name)-12s %(levelname)-4s %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO"),
+)
+logger = logging.getLogger(__file__)
+
+# another logger
+
+log = logging.getLogger("detailed_logger")
+
+log.propagate = False
+
+level = logging.INFO
+
+log.setLevel(level)
+
+formatter = logging.Formatter(
+    "[%(asctime)s] %(levelname)s : %(pathname)s:%(lineno)d - %(message)s",
+    "%Y-%m-%d %H:%M:%S",
+)
+
+streamHandler = logging.StreamHandler()
+streamHandler.setLevel(level)
+streamHandler.setFormatter(formatter)
+log.addHandler(streamHandler)
--- a/utils/metrics.py
+++ b/utils/metrics.py
@@ -0,0 +1,320 @@
+# coding: utf-8
+
+import os
+from collections import Counter
+from copy import deepcopy
+from typing import List, Tuple
+
+import Levenshtein
+import numpy as np
+from schemas.context import ASRContext
+from utils.logger import logger
+from utils.tokenizer import Tokenizer, TokenizerType
+from utils.update_submit import change_product_available
+
+IN_TEST = os.getenv("SUBMIT_CONFIG_FILEPATH", None) is None
+
+
+def text_align(context: ASRContext) -> Tuple:
+    start_end_count = 0
+
+    label_start_time_list = []
+    label_end_time_list = []
+    for label_item in context.labels:
+        label_start_time_list.append(label_item.start)
+        label_end_time_list.append(label_item.end)
+    pred_start_time_list = []
+    pred_end_time_list = []
+    sentence_start = True
+    for pred_item in context.preds:
+        if sentence_start:
+            pred_start_time_list.append(pred_item.recognition_results.start_time)
+        if pred_item.recognition_results.final_result:
+            pred_end_time_list.append(pred_item.recognition_results.end_time)
+        sentence_start = pred_item.recognition_results.final_result
+    # check start0 < end0 < start1 < end1 < start2 < end2 - ...
+    if IN_TEST:
+        print(pred_start_time_list)
+        print(pred_end_time_list)
+    pred_time_list = []
+    i, j = 0, 0
+    while i < len(pred_start_time_list) and j < len(pred_end_time_list):
+        pred_time_list.append(pred_start_time_list[i])
+        pred_time_list.append(pred_end_time_list[j])
+        i += 1
+        j += 1
+    if i < len(pred_start_time_list):
+        pred_time_list.append(pred_start_time_list[-1])
+    for i in range(1, len(pred_time_list)):
+        # 这里给个 600ms 的宽限
+        if pred_time_list[i] < pred_time_list[i - 1] - 0.6:
+            logger.error("识别的 start、end 不符合 start0 < end0 < start1 < end1 < start2 < end2 ...")
+            logger.error(
+                f"当前识别的每个句子开始和结束时间分别为: \
+                开始时间：{pred_start_time_list}, \
+                结束时间：{pred_end_time_list}"
+            )
+            start_end_count += 1
+            # change_product_available()
+    # 时间前后差值 300ms 范围内
+    start_time_align_count = 0
+    end_time_align_count = 0
+    for label_start_time in label_start_time_list:
+        for pred_start_time in pred_start_time_list:
+            if pred_start_time <= label_start_time + 0.3 and pred_start_time >= label_start_time - 0.3:
+                start_time_align_count += 1
+                break
+    for label_end_time in label_end_time_list:
+        for pred_end_time in pred_end_time_list:
+            if pred_end_time <= label_end_time + 0.3 and pred_end_time >= label_end_time - 0.3:
+                end_time_align_count += 1
+                break
+    logger.info(
+        f"start-time 对齐个数 {start_time_align_count}, \
+        end-time 对齐个数 {end_time_align_count}\
+        数据集中句子总数 {len(label_start_time_list)}"
+    )
+    return start_time_align_count, end_time_align_count, start_end_count
+
+
+def first_delay(context: ASRContext) -> Tuple:
+    first_send_time = context.preds[0].send_time
+    first_delay_list = []
+    sentence_start = True
+    for pred_context in context.preds:
+        if sentence_start:
+            sentence_begin_time = pred_context.recognition_results.start_time
+            first_delay_time = pred_context.recv_time - first_send_time - sentence_begin_time
+            first_delay_list.append(first_delay_time)
+        sentence_start = pred_context.recognition_results.final_result
+    if IN_TEST:
+        print(f"当前音频的首字延迟为{first_delay_list}")
+    logger.info(f"当前音频的首字延迟均值为 {np.mean(first_delay_list)}s")
+    return np.sum(first_delay_list), len(first_delay_list)
+
+
+def revision_delay(context: ASRContext):
+    first_send_time = context.preds[0].send_time
+    revision_delay_list = []
+    for pred_context in context.preds:
+        if pred_context.recognition_results.final_result:
+            sentence_end_time = pred_context.recognition_results.end_time
+            revision_delay_time = pred_context.recv_time - first_send_time - sentence_end_time
+            revision_delay_list.append(revision_delay_time)
+
+    if IN_TEST:
+        print(revision_delay_list)
+    logger.info(f"当前音频的修正延迟均值为 {np.mean(revision_delay_list)}s")
+    return np.sum(revision_delay_list), len(revision_delay_list)
+
+
+def patch_unique_token_count(context: ASRContext):
+    # print(context.__dict__)
+    # 对于每一个返回的结果都进行 tokenize
+    pred_text_list = [pred_context.recognition_results.text for pred_context in context.preds]
+    pred_text_tokenized_list = Tokenizer.norm_and_tokenize(pred_text_list, lang=context.lang)
+    # print(pred_text_list)
+    # print(pred_text_tokenized_list)
+
+    # 判断当前是否修改了超过 3s 内的 token 数目
+    ## 当前句子的最开始接受时间
+    first_recv_time = None
+    ## 不可修改的 token 个数
+    unmodified_token_cnt = 0
+    ## 3s 的 index 位置
+    time_token_idx = 0
+    ## 当前是句子的开始
+    final_sentence = True
+
+    ## 修改了不可修改的范围
+    is_unmodified_token = False
+
+    for idx, (now_tokens, pred_context) in enumerate(zip(pred_text_tokenized_list, context.preds)):
+        ## 当前是句子的第一次返回
+        if final_sentence:
+            first_recv_time = pred_context.recv_time
+            unmodified_token_cnt = 0
+            time_token_idx = idx
+            final_sentence = pred_context.recognition_results.final_result
+            continue
+        final_sentence = pred_context.recognition_results.final_result
+        ## 当前 pred 的 recv-time
+        pred_recv_time = pred_context.recv_time
+        ## 最开始 3s 直接忽略
+        if pred_recv_time - first_recv_time < 3:
+            continue
+        ## 根据历史返回信息，获得最长不可修改长度
+        while time_token_idx < idx:
+            context_pred_tmp = context.preds[time_token_idx]
+            context_pred_tmp_recv_time = context_pred_tmp.recv_time
+            tmp_tokens = pred_text_tokenized_list[time_token_idx]
+            if pred_recv_time - context_pred_tmp_recv_time >= 3:
+                unmodified_token_cnt = max(unmodified_token_cnt, len(tmp_tokens))
+                time_token_idx += 1
+            else:
+                break
+        ## 和自己的上一条音频比，只能修改 unmodified_token_cnt 个 token
+        last_tokens = pred_text_tokenized_list[idx - 1]
+        if context.lang in ['ar', 'he']:
+            tokens_check_pre, tokens_check_now = last_tokens[::-1], now_tokens[::-1]
+            continue
+        else:
+            tokens_check_pre, tokens_check_now = last_tokens, now_tokens
+        for token_a, token_b in zip(tokens_check_pre[:unmodified_token_cnt], tokens_check_now[:unmodified_token_cnt]):
+            if token_a != token_b:
+                is_unmodified_token = True
+                break
+
+        if is_unmodified_token and int(os.getenv('test', 0)):
+            logger.error(
+                f"{idx}-{unmodified_token_cnt}-{last_tokens[:unmodified_token_cnt]}-{now_tokens[:unmodified_token_cnt]}"
+            )
+        if is_unmodified_token:
+            break
+
+    if is_unmodified_token:
+        logger.error("修改了不可修改的文字范围")
+        # change_product_available()
+        if int(os.getenv('test', 0)):
+            final_result = True
+            result_list = []
+            for tokens, pred in zip(pred_text_tokenized_list, context.preds):
+                if final_result:
+                    result_list.append([])
+                result_list[-1].append((tokens, pred.recv_time - context.preds[0].recv_time))
+                final_result = pred.recognition_results.final_result
+            for item in result_list:
+                logger.info(str(item))
+
+    # 记录每个 patch 的 token 个数
+    patch_unique_cnt_counter = Counter()
+    patch_unique_cnt_in_one_sentence = set()
+    for pred_text_tokenized, pred_context in zip(pred_text_tokenized_list, context.preds):
+        token_cnt = len(pred_text_tokenized)
+        patch_unique_cnt_in_one_sentence.add(token_cnt)
+        if pred_context.recognition_results.final_result:
+            for unique_cnt in patch_unique_cnt_in_one_sentence:
+                patch_unique_cnt_counter[unique_cnt] += 1
+            patch_unique_cnt_in_one_sentence.clear()
+    if context.preds and not context.preds[-1].recognition_results.final_result:
+        for unique_cnt in patch_unique_cnt_in_one_sentence:
+            patch_unique_cnt_counter[unique_cnt] += 1
+    # print(patch_unique_cnt_counter)
+    logger.info(
+        f"当前音频的 patch token 均值为 {mean_on_counter(patch_unique_cnt_counter)}, \
+        当前音频的 patch token 方差为 {var_on_counter(patch_unique_cnt_counter)}"
+    )
+    return patch_unique_cnt_counter
+
+
+def mean_on_counter(counter: Counter):
+    total_sum = sum(key * count for key, count in counter.items())
+    total_count = sum(counter.values())
+    return total_sum * 1.0 / total_count
+
+
+def var_on_counter(counter: Counter):
+    total_sum = sum(key * count for key, count in counter.items())
+    total_count = sum(counter.values())
+    mean = total_sum * 1.0 / total_count
+    return sum((key - mean) ** 2 * count for key, count in counter.items()) / total_count
+
+
+def edit_distance(arr1: List, arr2: List):
+    operations = Levenshtein.editops(arr1, arr2)
+    i = sum([1 for operation in operations if operation[0] == "insert"])
+    s = sum([1 for operation in operations if operation[0] == "replace"])
+    d = sum([1 for operation in operations if operation[0] == "delete"])
+    c = len(arr1) - s - d
+    return s, d, i, c
+
+
+def cer(tokens_gt_mapping: List[str], tokens_dt_mapping: List[str]):
+    """输入的是经过编辑距离映射后的两个 token 序列，返回 1-cer, token-cnt"""
+    insert = sum(1 for item in tokens_gt_mapping if item is None)
+    delete = sum(1 for item in tokens_dt_mapping if item is None)
+    equal = sum(1 for token_gt, token_dt in zip(tokens_gt_mapping, tokens_dt_mapping) if token_gt == token_dt)
+    replace = len(tokens_gt_mapping) - insert - equal
+
+    token_count = replace + equal + delete
+    cer_value = (replace + delete + insert) * 1.0 / token_count
+    logger.info(f"当前音频的 cer/wer 值为 {cer_value}, token 个数为 {token_count}")
+    return 1 - cer_value, token_count
+
+
+def cut_rate(
+    tokens_gt: List[List[str]],
+    tokens_dt: List[List[str]],
+    tokens_gt_mapping: List[str],
+    tokens_dt_mapping: List[str],
+):
+    sentence_final_token_index_gt = sentence_final_token_index(tokens_gt, tokens_gt_mapping)
+    sentence_final_token_index_dt = sentence_final_token_index(tokens_dt, tokens_dt_mapping)
+    sentence_final_token_index_gt = set(sentence_final_token_index_gt)
+    sentence_final_token_index_dt = set(sentence_final_token_index_dt)
+    sentence_count_gt = len(sentence_final_token_index_gt)
+    miss_count = len(sentence_final_token_index_gt - sentence_final_token_index_dt)
+    more_count = len(sentence_final_token_index_dt - sentence_final_token_index_gt)
+    rate = max(1 - (miss_count + more_count * 2) / sentence_count_gt, 0)
+    return rate, sentence_count_gt, miss_count, more_count
+
+
+def token_mapping(tokens_gt: List[str], tokens_dt: List[str]) -> Tuple[List[str], List[str]]:
+    arr1 = deepcopy(tokens_gt)
+    arr2 = deepcopy(tokens_dt)
+    operations = Levenshtein.editops(arr1, arr2)
+    for op in operations[::-1]:
+        if op[0] == "insert":
+            arr1.insert(op[1], None)
+        elif op[0] == "delete":
+            arr2.insert(op[2], None)
+    return arr1, arr2
+
+
+def sentence_final_token_index(tokens: List[List[str]], tokens_mapping: List[str]) -> List[int]:
+    """获得原句子中每个句子尾部 token 的 index"""
+    token_index_list = []
+    token_index = 0
+    for token_in_one_sentence in tokens:
+        for _ in range(len(token_in_one_sentence)):
+            while token_index < len(tokens_mapping) and tokens_mapping[token_index] is None:
+                token_index += 1
+            token_index += 1
+        token_index_list.append(token_index - 1)
+    return token_index_list
+
+
+def cut_sentence(sentences: List[str], tokenizerType: TokenizerType) -> List[str]:
+    """use self.cut_punc to cut all sentences, merge them and put them into list"""
+    sentence_cut_list = []
+    for sentence in sentences:
+        sentence_list = [sentence]
+        sentence_tmp_list = []
+        for punc in [
+            "······",
+            "......",
+            "。",
+            "，",
+            "？",
+            "！",
+            "；",
+            "：",
+            "...",
+            ".",
+            ",",
+            "?",
+            "!",
+            ";",
+            ":",
+        ]:
+            for sentence in sentence_list:
+                sentence_tmp_list.extend(sentence.split(punc))
+            sentence_list, sentence_tmp_list = sentence_tmp_list, []
+        sentence_list = [item for item in sentence_list if item]
+
+        if tokenizerType == TokenizerType.whitespace:
+            sentence_cut_list.append(" ".join(sentence_list))
+        else:
+            sentence_cut_list.append("".join(sentence_list))
+
+    return sentence_cut_list
--- a/utils/metrics_plus.py
+++ b/utils/metrics_plus.py
@@ -0,0 +1,50 @@
+from typing import List
+
+from utils.tokenizer import TokenizerType
+
+
+def replace_general_punc(
+    sentences: List[str], tokenizer: TokenizerType
+) -> List[str]:
+    """代替原来的函数 utils.metrics.cut_sentence"""
+    general_puncs = [
+        "······",
+        "......",
+        "。",
+        "，",
+        "？",
+        "！",
+        "；",
+        "：",
+        "...",
+        ".",
+        ",",
+        "?",
+        "!",
+        ";",
+        ":",
+    ]
+    if tokenizer == TokenizerType.whitespace:
+        replacer = " "
+    else:
+        replacer = ""
+    trans = str.maketrans(dict.fromkeys("".join(general_puncs), replacer))
+    ret_sentences = [""] * len(sentences)
+    for i, sentence in enumerate(sentences):
+        sentence = sentence.translate(trans)
+        sentence = sentence.strip()
+        sentence = sentence.lower()
+        ret_sentences[i] = sentence
+    return ret_sentences
+
+
+def distance_point_line(
+    point: float, line_start: float, line_end: float
+) -> float:
+    """计算点到直线的距离"""
+    if line_start <= point <= line_end:
+        return 0
+    if point < line_start:
+        return abs(point - line_start)
+    else:
+        return abs(point - line_end)
--- a/utils/pynini/Dockerfile
+++ b/utils/pynini/Dockerfile
@@ -0,0 +1,93 @@
+# Dockerfile
+# Pierre-André Noël, May 12th 2020
+# Copyright © Element AI Inc. All rights reserved.
+# Apache License, Version 2.0
+#
+# This builds `manylinux_2_28_x86_64` Python wheels for `pynini`, wrapping
+# all its dependencies.
+#
+# This Dockerfile uses multi-stage builds; for more information, see:
+# https://docs.docker.com/develop/develop-images/multistage-build/
+# 
+# The recommended installation method for Pynini is through Conda-Forge. This gives Linux
+# x86-64 users another option: installing a precompiled module from PyPI.
+# 
+# 
+# To build wheels and run Pynini's tests, run:
+# 
+#     docker build --target=run-tests -t build-pynini-wheels .
+# 
+# To extract the resulting wheels from the Docker image, run:
+#
+#     docker run --rm -v `pwd`:/io build-pynini-wheels cp -r /wheelhouse /io
+#
+# Notice that this also generates Cython wheels.
+# 
+# Then, `twine` (https://twine.readthedocs.io/en/latest/) can be used to
+# publish the resulting Pynini wheels.
+
+# ******************************************************
+# *** All the following images are based on this one ***
+# ******************************************************
+#from quay.io/pypa/manylinux_2_28_x86_64 AS common
+
+# ***********************************************************************
+# *** Image providing all the requirements for building Pynini wheels ***
+# ***********************************************************************
+FROM harbor.4pd.io/inf/base-python3.8-ubuntu:1.1.0
+
+# The versions we want in the wheels.
+ENV FST_VERSION "1.8.3"
+ENV PYNINI_VERSION "2.1.6"
+
+# Location of OpenFst and Pynini.
+ENV FST_DOWNLOAD_PREFIX "https://www.openfst.org/twiki/pub/FST/FstDownload"
+ENV PYNINI_DOWNLOAD_PREFIX "https://www.opengrm.org/twiki/pub/GRM/PyniniDownload"
+
+# Note that our certificates are not known to the version of wget available in this image.
+
+# Gets and unpack OpenFst source.
+RUN apt update && apt-get install -y wget gcc-9 g++-9 make && ln -s $(which gcc-9) /usr/bin/gcc && ln -s $(which g++-9) /usr/bin/g++
+RUN cd /tmp \
+    && wget -q --no-check-certificate "${FST_DOWNLOAD_PREFIX}/openfst-${FST_VERSION}.tar.gz" \
+    && tar -xzf "openfst-${FST_VERSION}.tar.gz" \
+    && rm "openfst-${FST_VERSION}.tar.gz"
+
+# Compiles OpenFst.
+RUN cd "/tmp/openfst-${FST_VERSION}" \
+    && ./configure --enable-grm \
+    && make --jobs 4 install \
+    && rm -rd "/tmp/openfst-${FST_VERSION}"
+
+# Gets and unpacks Pynini source.
+RUN mkdir -p /src && cd /src \
+    && wget -q --no-check-certificate "${PYNINI_DOWNLOAD_PREFIX}/pynini-${PYNINI_VERSION}.tar.gz" \
+    && tar -xzf "pynini-${PYNINI_VERSION}.tar.gz" \
+    && rm "pynini-${PYNINI_VERSION}.tar.gz"
+
+# Installs requirements in all our Pythons.
+RUN pip install -i https://nexus.4pd.io/repository/pypi-all/simple -r "/src/pynini-${PYNINI_VERSION}/requirements.txt" || exit; 
+
+
+# **********************************************************
+# *** Image making pynini wheels (placed in /wheelhouse) ***
+# **********************************************************
+#FROM wheel-building-env AS build-wheels
+
+# Compiles the wheels to a temporary directory.
+RUN pip wheel -i https://nexus.4pd.io/repository/pypi-all/simple -v "/src/pynini-${PYNINI_VERSION}" -w /tmp/wheelhouse/ || exit; 
+
+RUN wget ftp://ftp.4pd.io/pub/pico/temp/patchelf-0.18.0-x86_64.tar.gz && tar xzf patchelf-0.18.0-x86_64.tar.gz && rm -f patchelf-0.18.0-x86_64.tar.gz
+RUN pip install -i https://nexus.4pd.io/repository/pypi-all/simple auditwheel
+# Bundles external shared libraries into the wheels.
+# See https://github.com/pypa/manylinux/tree/manylinux2014
+RUN for WHL in /tmp/wheelhouse/pynini*.whl; do \
+    PATH=$(pwd)/bin:$PATH auditwheel repair --plat manylinux_2_31_x86_64 "${WHL}" -w /wheelhouse/ || exit; \
+done
+#RUN mkdir -p /wheelhouse && for WHL in /tmp/wheelhouse/pynini*.whl; do \
+#    cp "${WHL}" /wheelhouse/; \
+#done
+
+# Removes the non-repaired wheels.
+RUN rm -rd /tmp/wheelhouse
+
--- a/utils/pynini/README.md
+++ b/utils/pynini/README.md
@@ -0,0 +1,17 @@
+# pynini
+
+## 背景
+
+SpeechIO对英文ASR的评估工具依赖第三方库pynini（https://github.com/kylebgorman/pynini），该库强绑定OS和gcc版本，需要在运行环境中编译生成wheel包，本文说明编译pynini生成wheel包的方法
+
+## 编译
+
+```shell
+docker build -t build-pynini-wheels .
+```
+
+## 获取wheel包
+
+```shell
+docker run --rm -v `pwd`:/io build-pynini-wheels cp -r /wheelhouse /io
+```
--- a/utils/request.py
+++ b/utils/request.py
@@ -0,0 +1,40 @@
+import requests
+from requests.adapters import HTTPAdapter
+from requests.packages.urllib3.util.retry import Retry
+
+DEFAULT_TIMEOUT = 2 * 60  # seconds
+
+
+class TimeoutHTTPAdapter(HTTPAdapter):
+    def __init__(self, *args, **kwargs):
+        self.timeout = DEFAULT_TIMEOUT
+        if "timeout" in kwargs:
+            self.timeout = kwargs["timeout"]
+            del kwargs["timeout"]
+        super().__init__(*args, **kwargs)
+
+    def send(self, request, **kwargs):
+        timeout = kwargs.get("timeout")
+        if timeout is None:
+            kwargs["timeout"] = self.timeout
+        return super().send(request, **kwargs)
+
+
+def requests_retry_session(
+    retries=3,
+    backoff_factor=1,
+    status_forcelist=[500, 502, 504, 404, 403],
+    session=None,
+):
+    session = session or requests.Session()
+    retry = Retry(
+        total=retries,
+        read=retries,
+        connect=retries,
+        backoff_factor=backoff_factor,
+        status_forcelist=status_forcelist,
+    )
+    adapter = TimeoutHTTPAdapter(max_retries=retry)
+    session.mount('http://', adapter)
+    session.mount('https://', adapter)
+    return session
--- a/utils/service.py
+++ b/utils/service.py
@@ -0,0 +1,65 @@
+# -*- coding: utf-8 -*-
+import os
+import sys
+
+from utils.helm import deploy_chart, gen_chart_tarball
+from utils.logger import logger
+
+
+def register_sut(st_config, resource_name, **kwargs):
+
+    job_id = "".join([c for c in str(os.getenv("JOB_ID", -1)) if c.isnumeric()])
+
+    docker_image = "10.255.143.18:5000/speaker_identification:wo_model_v0"
+    #if "docker_image" in st_config and st_config["docker_image"]:
+    st_config_values = st_config.get("values", {})
+    #docker_image = st_config["docker_image"]
+    docker_image = "10.255.143.18:5000/speaker_identification:wo_model_v0"
+    chart_tar_fp, chart_values = gen_chart_tarball(docker_image)
+    sut_service_name, _ = deploy_chart(
+        resource_name,
+        int(os.getenv("readiness_timeout", 60 * 3)),
+        chart_fileobj=chart_tar_fp,
+        extra_values=st_config_values,
+        restart_count_limit=int(os.getenv('restart_count', 3)),
+    )
+    chart_tar_fp.close()
+    if st_config_values is not None and "service" in st_config_values and "port" in st_config_values["service"]:
+        sut_service_port = str(st_config_values["service"]["port"])
+    else:
+        sut_service_port = str(chart_values["service"]["port"])
+    return "ws://{}:{}".format(sut_service_name, sut_service_port)
+    
+
+    """
+    elif "chart_repo" in st_config:
+        logger.info(f"正在使用 helm-chart 配置，内容为 {st_config}")
+        chart_repo = st_config.get("chart_repo", None)
+        chart_name = st_config.get("chart_name", None)
+        chart_version = st_config.get("chart_version", None)
+        if chart_repo is None or chart_name is None or chart_version is None:
+            logger.error("chart_repo, chart_name, chart_version cant be none")
+        logger.info(f"{chart_repo} {chart_name} {chart_version}")
+        chart_str = os.path.join(chart_repo, chart_name) + ':' + chart_version
+
+        st_cfg_values = st_config.get('values', {})
+        st_config["values"] = st_cfg_values
+
+        sut_service_name, _ = deploy_chart(
+            resource_name,
+            600,
+            chart_str=chart_str,
+            extra_values=st_cfg_values,
+        )
+        sut_service_name = f"asr-{job_id}"
+        if st_cfg_values is not None and 'service' in st_cfg_values and 'port' in st_cfg_values['service']:
+            sut_service_port = str(st_cfg_values['service']['port'])
+        else:
+            sut_service_port = '80'
+        return 'ws://%s:%s' % (sut_service_name, sut_service_port)
+    else:
+        logger.error("配置信息错误，缺少 docker_image 属性")
+        #sys.exit(-1)
+
+
+    """
--- a/utils/speechio/init.py
+++ b/utils/speechio/init.py
@@ -0,0 +1,3 @@
+'''
+reference: https://github.com/SpeechColab/Leaderboard/tree/f287a992dc359d1c021bfc6ce810e5e36608e057/utils
+'''
--- a/utils/speechio/error_rate_en.py
+++ b/utils/speechio/error_rate_en.py
@@ -0,0 +1,551 @@
+#!/usr/bin/env python3
+# coding=utf8
+# Copyright  2022  Zhenxiang MA, Jiayu DU (SpeechColab)
+
+import argparse
+import csv
+import json
+import logging
+import os
+import sys
+from typing import Iterable
+
+logging.basicConfig(stream=sys.stderr, level=logging.ERROR, format='[%(levelname)s] %(message)s')
+
+import pynini
+from pynini.lib import pynutil
+
+
+# reference: https://github.com/kylebgorman/pynini/blob/master/pynini/lib/edit_transducer.py
+# to import original lib:
+#     from pynini.lib.edit_transducer import EditTransducer
+class EditTransducer:
+    DELETE = "<delete>"
+    INSERT = "<insert>"
+    SUBSTITUTE = "<substitute>"
+
+    def __init__(
+        self,
+        symbol_table,
+        vocab: Iterable[str],
+        insert_cost: float = 1.0,
+        delete_cost: float = 1.0,
+        substitute_cost: float = 1.0,
+        bound: int = 0,
+    ):
+        # Left factor; note that we divide the edit costs by two because they also
+        # will be incurred when traversing the right factor.
+        sigma = pynini.union(
+            *[pynini.accep(token, token_type=symbol_table) for token in vocab],
+        ).optimize()
+
+        insert = pynutil.insert(f"[{self.INSERT}]", weight=insert_cost / 2)
+        delete = pynini.cross(sigma, pynini.accep(f"[{self.DELETE}]", weight=delete_cost / 2))
+        substitute = pynini.cross(sigma, pynini.accep(f"[{self.SUBSTITUTE}]", weight=substitute_cost / 2))
+
+        edit = pynini.union(insert, delete, substitute).optimize()
+
+        if bound:
+            sigma_star = pynini.closure(sigma)
+            self._e_i = sigma_star.copy()
+            for _ in range(bound):
+                self._e_i.concat(edit.ques).concat(sigma_star)
+        else:
+            self._e_i = edit.union(sigma).closure()
+
+        self._e_i.optimize()
+
+        right_factor_std = EditTransducer._right_factor(self._e_i)
+        # right_factor_ext allows 0-cost matching between token's raw form & auxiliary form
+        # e.g.: 'I' -> 'I#', 'AM' -> 'AM#'
+        right_factor_ext = (
+            pynini.union(
+                *[
+                    pynini.cross(
+                        pynini.accep(x, token_type=symbol_table),
+                        pynini.accep(x + '#', token_type=symbol_table),
+                    )
+                    for x in vocab
+                ]
+            )
+            .optimize()
+            .closure()
+        )
+        self._e_o = pynini.union(right_factor_std, right_factor_ext).closure().optimize()
+
+    @staticmethod
+    def _right_factor(ifst: pynini.Fst) -> pynini.Fst:
+        ofst = pynini.invert(ifst)
+        syms = pynini.generated_symbols()
+        insert_label = syms.find(EditTransducer.INSERT)
+        delete_label = syms.find(EditTransducer.DELETE)
+        pairs = [(insert_label, delete_label), (delete_label, insert_label)]
+        right_factor = ofst.relabel_pairs(ipairs=pairs)
+        return right_factor
+
+    def create_lattice(self, iexpr: pynini.FstLike, oexpr: pynini.FstLike) -> pynini.Fst:
+        lattice = (iexpr @ self._e_i) @ (self._e_o @ oexpr)
+        EditTransducer.check_wellformed_lattice(lattice)
+        return lattice
+
+    @staticmethod
+    def check_wellformed_lattice(lattice: pynini.Fst) -> None:
+        if lattice.start() == pynini.NO_STATE_ID:
+            raise RuntimeError("Edit distance composition lattice is empty.")
+
+    def compute_distance(self, iexpr: pynini.FstLike, oexpr: pynini.FstLike) -> float:
+        lattice = self.create_lattice(iexpr, oexpr)
+        # The shortest cost from all final states to the start state is
+        # equivalent to the cost of the shortest path.
+        start = lattice.start()
+        return float(pynini.shortestdistance(lattice, reverse=True)[start])
+
+    def compute_alignment(self, iexpr: pynini.FstLike, oexpr: pynini.FstLike) -> pynini.FstLike:
+        print(iexpr)
+        print(oexpr)
+        lattice = self.create_lattice(iexpr, oexpr)
+        alignment = pynini.shortestpath(lattice, nshortest=1, unique=True)
+        return alignment.optimize()
+
+
+class ErrorStats:
+    def __init__(self):
+        self.num_ref_utts = 0
+        self.num_hyp_utts = 0
+        self.num_eval_utts = 0  # in both ref & hyp
+        self.num_hyp_without_ref = 0
+
+        self.C = 0
+        self.S = 0
+        self.I = 0
+        self.D = 0
+        self.token_error_rate = 0.0
+        self.modified_token_error_rate = 0.0
+
+        self.num_utts_with_error = 0
+        self.sentence_error_rate = 0.0
+
+    def to_json(self):
+        # return json.dumps(self.__dict__, indent=4)
+        return json.dumps(self.__dict__)
+
+    def to_kaldi(self):
+        info = (
+            F'%WER {self.token_error_rate:.2f} [ {self.S + self.D + self.I} / {self.C + self.S + self.D}, {self.I} ins, {self.D} del, {self.S} sub ]\n'
+            F'%SER {self.sentence_error_rate:.2f} [ {self.num_utts_with_error} / {self.num_eval_utts} ]\n'
+        )
+        return info
+
+    def to_summary(self):
+        summary = (
+            '==================== Overall Statistics ====================\n'
+            F'num_ref_utts: {self.num_ref_utts}\n'
+            F'num_hyp_utts: {self.num_hyp_utts}\n'
+            F'num_hyp_without_ref: {self.num_hyp_without_ref}\n'
+            F'num_eval_utts: {self.num_eval_utts}\n'
+            F'sentence_error_rate: {self.sentence_error_rate:.2f}%\n'
+            F'token_error_rate: {self.token_error_rate:.2f}%\n'
+            F'modified_token_error_rate: {self.modified_token_error_rate:.2f}%\n'
+            F'token_stats:\n'
+            F'  - tokens:{self.C + self.S + self.D:>7}\n'
+            F'  - edits: {self.S + self.I + self.D:>7}\n'
+            F'  - cor:   {self.C:>7}\n'
+            F'  - sub:   {self.S:>7}\n'
+            F'  - ins:   {self.I:>7}\n'
+            F'  - del:   {self.D:>7}\n'
+            '============================================================\n'
+        )
+        return summary
+
+
+class Utterance:
+    def __init__(self, uid, text):
+        self.uid = uid
+        self.text = text
+
+
+def LoadKaldiArc(filepath):
+    utts = {}
+    with open(filepath, 'r', encoding='utf8') as f:
+        for line in f:
+            line = line.strip()
+            if line:
+                cols = line.split(maxsplit=1)
+                assert len(cols) == 2 or len(cols) == 1
+                uid = cols[0]
+                text = cols[1] if len(cols) == 2 else ''
+                if utts.get(uid) != None:
+                    raise RuntimeError(F'Found duplicated utterence id {uid}')
+                utts[uid] = Utterance(uid, text)
+    return utts
+
+
+def BreakHyphen(token: str):
+    # 'T-SHIRT' should also introduce new words into vocabulary, e.g.:
+    #   1. 'T' & 'SHIRT'
+    #   2. 'TSHIRT'
+    assert '-' in token
+    v = token.split('-')
+    v.append(token.replace('-', ''))
+    return v
+
+
+def LoadGLM(rel_path):
+    '''
+    glm.csv:
+        I'VE,I HAVE
+        GOING TO,GONNA
+        ...
+        T-SHIRT,T SHIRT,TSHIRT
+
+    glm:
+        {
+            '<RULE_00000>': ["I'VE", 'I HAVE'],
+            '<RULE_00001>': ['GOING TO', 'GONNA'],
+            ...
+            '<RULE_99999>': ['T-SHIRT', 'T SHIRT', 'TSHIRT'],
+        }
+    '''
+    logging.info(f'Loading GLM from {rel_path} ...')
+
+    abs_path = os.path.dirname(os.path.abspath(__file__)) + '/' + rel_path
+    reader = list(csv.reader(open(abs_path, encoding="utf-8"), delimiter=','))
+
+    glm = {}
+    for k, rule in enumerate(reader):
+        rule_name = f'<RULE_{k:06d}>'
+        glm[rule_name] = [phrase.strip() for phrase in rule]
+    logging.info(f'  #rule: {len(glm)}')
+
+    return glm
+
+
+def SymbolEQ(symbol_table, i1, i2):
+    return symbol_table.find(i1).strip('#') == symbol_table.find(i2).strip('#')
+
+
+def PrintSymbolTable(symbol_table: pynini.SymbolTable):
+    print('SYMBOL_TABLE:')
+    for k in range(symbol_table.num_symbols()):
+        sym = symbol_table.find(k)
+        assert symbol_table.find(sym) == k  # symbol table's find can be used for bi-directional lookup (id <-> sym)
+        print(k, sym)
+    print()
+
+
+def BuildSymbolTable(vocab) -> pynini.SymbolTable:
+    logging.info('Building symbol table ...')
+    symbol_table = pynini.SymbolTable()
+    symbol_table.add_symbol('<epsilon>')
+
+    for w in vocab:
+        symbol_table.add_symbol(w)
+    logging.info(f'  #symbols: {symbol_table.num_symbols()}')
+
+    # PrintSymbolTable(symbol_table)
+    # symbol_table.write_text('symbol_table.txt')
+    return symbol_table
+
+
+def BuildGLMTagger(glm, symbol_table) -> pynini.Fst:
+    logging.info('Building GLM tagger ...')
+    rule_taggers = []
+    for rule_tag, rule in glm.items():
+        for phrase in rule:
+            rule_taggers.append(
+                (
+                    pynutil.insert(pynini.accep(rule_tag, token_type=symbol_table))
+                    + pynini.accep(phrase, token_type=symbol_table)
+                    + pynutil.insert(pynini.accep(rule_tag, token_type=symbol_table))
+                )
+            )
+
+    alphabet = pynini.union(
+        *[pynini.accep(sym, token_type=symbol_table) for k, sym in symbol_table if k != 0]  # non-epsilon
+    ).optimize()
+
+    tagger = pynini.cdrewrite(
+        pynini.union(*rule_taggers).optimize(), '', '', alphabet.closure()
+    ).optimize()  # could be slow with large vocabulary
+    return tagger
+
+
+def TokenWidth(token: str):
+    def CharWidth(c):
+        return 2 if (c >= '\u4e00') and (c <= '\u9fa5') else 1
+
+    return sum([CharWidth(c) for c in token])
+
+
+def PrintPrettyAlignment(raw_hyp, edit_ali, ref_ali, hyp_ali, stream=sys.stderr):
+    assert len(edit_ali) == len(ref_ali) and len(ref_ali) == len(hyp_ali)
+
+    H = '  HYP# : '
+    R = '  REF  : '
+    E = '  EDIT : '
+    for i, e in enumerate(edit_ali):
+        h, r = hyp_ali[i], ref_ali[i]
+        e = '' if e == 'C' else e  # don't bother printing correct edit-tag
+
+        nr, nh, ne = TokenWidth(r), TokenWidth(h), TokenWidth(e)
+        n = max(nr, nh, ne) + 1
+
+        H += h + ' ' * (n - nh)
+        R += r + ' ' * (n - nr)
+        E += e + ' ' * (n - ne)
+
+    print(F'  HYP  : {raw_hyp}', file=stream)
+    print(H, file=stream)
+    print(R, file=stream)
+    print(E, file=stream)
+
+
+def ComputeTokenErrorRate(c, s, i, d):
+    assert (s + d + c) != 0
+    num_edits = s + d + i
+    ref_len = c + s + d
+    hyp_len = c + s + i
+    return 100.0 * num_edits / ref_len, 100.0 * num_edits / max(ref_len, hyp_len)
+
+
+def ComputeSentenceErrorRate(num_err_utts, num_utts):
+    assert num_utts != 0
+    return 100.0 * num_err_utts / num_utts
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--logk', type=int, default=500, help='logging interval')
+    parser.add_argument(
+        '--tokenizer', choices=['whitespace', 'char'], default='whitespace', help='whitespace for WER, char for CER'
+    )
+    parser.add_argument('--glm', type=str, default='glm_en.csv', help='glm')
+    parser.add_argument('--ref', type=str, required=True, help='reference kaldi arc file')
+    parser.add_argument('--hyp', type=str, required=True, help='hypothesis kaldi arc file')
+    parser.add_argument('result_file', type=str)
+    args = parser.parse_args()
+    logging.info(args)
+
+    stats = ErrorStats()
+
+    logging.info('Generating tokenizer ...')
+    if args.tokenizer == 'whitespace':
+
+        def word_tokenizer(text):
+            return text.strip().split()
+
+        tokenizer = word_tokenizer
+    elif args.tokenizer == 'char':
+
+        def char_tokenizer(text):
+            return [c for c in text.strip().replace(' ', '')]
+
+        tokenizer = char_tokenizer
+    else:
+        tokenizer = None
+    assert tokenizer
+
+    logging.info('Loading REF & HYP ...')
+    ref_utts = LoadKaldiArc(args.ref)
+    hyp_utts = LoadKaldiArc(args.hyp)
+
+    # check valid utterances in hyp that have matched non-empty reference
+    uids = []
+    for uid in sorted(hyp_utts.keys()):
+        if uid in ref_utts.keys():
+            if ref_utts[uid].text.strip():  # non-empty reference
+                uids.append(uid)
+            else:
+                logging.warning(F'Found {uid} with empty reference, skipping...')
+        else:
+            logging.warning(F'Found {uid} without reference, skipping...')
+            stats.num_hyp_without_ref += 1
+
+    stats.num_hyp_utts = len(hyp_utts)
+    stats.num_ref_utts = len(ref_utts)
+    stats.num_eval_utts = len(uids)
+    logging.info(f'  #hyp:{stats.num_hyp_utts}, #ref:{stats.num_ref_utts}, #utts_to_evaluate:{stats.num_eval_utts}')
+    print(f'  #hyp:{stats.num_hyp_utts}, #ref:{stats.num_ref_utts}, #utts_to_evaluate:{stats.num_eval_utts}')
+
+    tokens = []
+    for uid in uids:
+        ref_tokens = tokenizer(ref_utts[uid].text)
+        hyp_tokens = tokenizer(hyp_utts[uid].text)
+        for t in ref_tokens + hyp_tokens:
+            tokens.append(t)
+            if '-' in t:
+                tokens.extend(BreakHyphen(t))
+    vocab_from_utts = list(set(tokens))
+    logging.info(f'  HYP&REF vocab size: {len(vocab_from_utts)}')
+    print(f'  HYP&REF vocab size: {len(vocab_from_utts)}')
+
+    assert args.glm
+    glm = LoadGLM(args.glm)
+
+    tokens = []
+    for rule in glm.values():
+        for phrase in rule:
+            for t in tokenizer(phrase):
+                tokens.append(t)
+                if '-' in t:
+                    tokens.extend(BreakHyphen(t))
+    vocab_from_glm = list(set(tokens))
+    logging.info(f'  GLM vocab size: {len(vocab_from_glm)}')
+    print(f'  GLM vocab size: {len(vocab_from_glm)}')
+
+    vocab = list(set(vocab_from_utts + vocab_from_glm))
+    logging.info(f'Global vocab size: {len(vocab)}')
+    print(f'Global vocab size: {len(vocab)}')
+
+    symtab = BuildSymbolTable(
+        # Normal evaluation vocab + auxiliary form for alternative paths + GLM tags
+        vocab
+        + [x + '#' for x in vocab]
+        + [x for x in glm.keys()]
+    )
+    glm_tagger = BuildGLMTagger(glm, symtab)
+    edit_transducer = EditTransducer(symbol_table=symtab, vocab=vocab)
+    print(edit_transducer)
+
+    logging.info('Evaluating error rate ...')
+    print('Evaluating error rate ...')
+    fo = open(args.result_file, 'w+', encoding='utf8')
+    ndone = 0
+    for uid in uids:
+        ref = ref_utts[uid].text
+        raw_hyp = hyp_utts[uid].text
+
+        ref_fst = pynini.accep(' '.join(tokenizer(ref)), token_type=symtab)
+        print(ref_fst)
+
+        # print(ref_fst.string(token_type = symtab))
+
+        raw_hyp_fst = pynini.accep(' '.join(tokenizer(raw_hyp)), token_type=symtab)
+        # print(raw_hyp_fst.string(token_type = symtab))
+
+        # Say, we have:
+        #   RULE_001: "I'M" <-> "I AM"
+        #   REF: HEY I AM HERE
+        #   HYP: HEY I'M HERE
+        #
+        # We want to expand HYP with GLM rules(marked with auxiliary #)
+        #   HYP#: HEY {I'M | I# AM#} HERE
+        # REF is honored to keep its original form.
+        #
+        # This could be considered as a flexible on-the-fly TN towards HYP.
+
+        # 1. GLM rule tagging:
+        #   HEY I'M HERE
+        # ->
+        #   HEY <RULE_001> I'M <RULE_001> HERE
+        lattice = (raw_hyp_fst @ glm_tagger).optimize()
+        tagged_ir = pynini.shortestpath(lattice, nshortest=1, unique=True).string(token_type=symtab)
+        # print(hyp_tagged)
+
+        # 2. GLM rule expansion:
+        #   HEY <RULE_001> I'M <RULE_001> HERE
+        # ->
+        #   sausage-like fst: HEY {I'M | I# AM#} HERE
+        tokens = tagged_ir.split()
+        sausage = pynini.accep('', token_type=symtab)
+        i = 0
+        while i < len(tokens):  # invariant: tokens[0, i) has been built into fst
+            forms = []
+            if tokens[i].startswith('<RULE_') and tokens[i].endswith('>'):  # rule segment
+                rule_name = tokens[i]
+                rule = glm[rule_name]
+                # pre-condition: i -> ltag
+                raw_form = ''
+                for j in range(i + 1, len(tokens)):
+                    if tokens[j] == rule_name:
+                        raw_form = ' '.join(tokens[i + 1 : j])
+                        break
+                assert raw_form
+                # post-condition: i -> ltag, j -> rtag
+
+                forms.append(raw_form)
+                for phrase in rule:
+                    if phrase != raw_form:
+                        forms.append(' '.join([x + '#' for x in phrase.split()]))
+                i = j + 1
+            else:  # normal token segment
+                token = tokens[i]
+                forms.append(token)
+                if "-" in token:  # token with hyphen yields extra forms
+                    forms.append(' '.join([x + '#' for x in token.split('-')]))  # 'T-SHIRT' -> 'T# SHIRT#'
+                    forms.append(token.replace('-', '') + '#')  # 'T-SHIRT' -> 'TSHIRT#'
+                i += 1
+
+            sausage_segment = pynini.union(*[pynini.accep(x, token_type=symtab) for x in forms]).optimize()
+            sausage += sausage_segment
+        hyp_fst = sausage.optimize()
+        print(hyp_fst)
+
+        # Utterance-Level error rate evaluation
+        alignment = edit_transducer.compute_alignment(ref_fst, hyp_fst)
+        print("alignment", alignment)
+
+        distance = 0.0
+        C, S, I, D = 0, 0, 0, 0  # Cor, Sub, Ins, Del
+        edit_ali, ref_ali, hyp_ali = [], [], []
+        for state in alignment.states():
+            for arc in alignment.arcs(state):
+                i, o = arc.ilabel, arc.olabel
+                if i != 0 and o != 0 and SymbolEQ(symtab, i, o):
+                    e = 'C'
+                    r, h = symtab.find(i), symtab.find(o)
+
+                    C += 1
+                    distance += 0.0
+                elif i != 0 and o != 0 and not SymbolEQ(symtab, i, o):
+                    e = 'S'
+                    r, h = symtab.find(i), symtab.find(o)
+
+                    S += 1
+                    distance += 1.0
+                elif i == 0 and o != 0:
+                    e = 'I'
+                    r, h = '*', symtab.find(o)
+
+                    I += 1
+                    distance += 1.0
+                elif i != 0 and o == 0:
+                    e = 'D'
+                    r, h = symtab.find(i), '*'
+
+                    D += 1
+                    distance += 1.0
+                else:
+                    raise RuntimeError
+
+                edit_ali.append(e)
+                ref_ali.append(r)
+                hyp_ali.append(h)
+        # assert(distance == edit_transducer.compute_distance(ref_fst, sausage))
+
+        utt_ter, utt_mter = ComputeTokenErrorRate(C, S, I, D)
+        # print(F'{{"uid":{uid}, "score":{-distance}, "TER":{utt_ter:.2f}, "mTER":{utt_mter:.2f}, "cor":{C}, "sub":{S}, "ins":{I}, "del":{D}}}', file=fo)
+        # PrintPrettyAlignment(raw_hyp, edit_ali, ref_ali, hyp_ali, fo)
+
+        if utt_ter > 0:
+            stats.num_utts_with_error += 1
+
+        stats.C += C
+        stats.S += S
+        stats.I += I
+        stats.D += D
+
+        ndone += 1
+        if ndone % args.logk == 0:
+            logging.info(f'{ndone} utts evaluated.')
+    logging.info(f'{ndone} utts evaluated in total.')
+
+    # Corpus-Level evaluation
+    stats.token_error_rate, stats.modified_token_error_rate = ComputeTokenErrorRate(stats.C, stats.S, stats.I, stats.D)
+    stats.sentence_error_rate = ComputeSentenceErrorRate(stats.num_utts_with_error, stats.num_eval_utts)
+
+    print(stats.to_json(), file=fo)
+    # print(stats.to_kaldi())
+    # print(stats.to_summary(), file=fo)
+
+    fo.close()
--- a/utils/speechio/error_rate_zh.py
+++ b/utils/speechio/error_rate_zh.py
@@ -0,0 +1,370 @@
+#!/usr/bin/env python3
+# coding=utf8
+
+# Copyright  2021  Jiayu DU
+
+import sys
+import argparse
+import json
+import logging
+logging.basicConfig(stream=sys.stderr, level=logging.INFO, format='[%(levelname)s] %(message)s')
+
+DEBUG = None
+
+def GetEditType(ref_token, hyp_token):
+    if ref_token == None and hyp_token != None:
+        return 'I'
+    elif ref_token != None and hyp_token == None:
+        return 'D'
+    elif ref_token == hyp_token:
+        return 'C'
+    elif ref_token != hyp_token:
+        return 'S'
+    else:
+        raise RuntimeError
+
+class AlignmentArc:
+    def __init__(self, src, dst, ref, hyp):
+        self.src = src
+        self.dst = dst
+        self.ref = ref
+        self.hyp = hyp
+        self.edit_type = GetEditType(ref, hyp)
+
+def similarity_score_function(ref_token, hyp_token):
+    return 0 if (ref_token == hyp_token) else -1.0
+
+def insertion_score_function(token):
+    return -1.0
+
+def deletion_score_function(token):
+    return -1.0
+
+def EditDistance(
+        ref,
+        hyp, 
+        similarity_score_function = similarity_score_function,
+        insertion_score_function = insertion_score_function,
+        deletion_score_function = deletion_score_function):
+    assert(len(ref) != 0)
+    class DPState:
+        def __init__(self):
+            self.score = -float('inf')
+            # backpointer
+            self.prev_r = None
+            self.prev_h = None
+    
+    def print_search_grid(S, R, H, fstream):
+        print(file=fstream)
+        for r in range(R):
+            for h in range(H):
+                print(F'[{r},{h}]:{S[r][h].score:4.3f}:({S[r][h].prev_r},{S[r][h].prev_h}) ', end='', file=fstream)
+            print(file=fstream)
+
+    R = len(ref) + 1
+    H = len(hyp) + 1
+
+    # Construct DP search space, a (R x H) grid
+    S = [ [] for r in range(R) ]
+    for r in range(R):
+        S[r] = [ DPState() for x in range(H) ]
+
+    # initialize DP search grid origin, S(r = 0, h = 0)
+    S[0][0].score = 0.0
+    S[0][0].prev_r = None
+    S[0][0].prev_h = None
+
+    # initialize REF axis
+    for r in range(1, R):
+        S[r][0].score = S[r-1][0].score + deletion_score_function(ref[r-1])
+        S[r][0].prev_r = r-1
+        S[r][0].prev_h = 0
+
+    # initialize HYP axis
+    for h in range(1, H):
+        S[0][h].score = S[0][h-1].score + insertion_score_function(hyp[h-1])
+        S[0][h].prev_r = 0
+        S[0][h].prev_h = h-1
+
+    best_score = S[0][0].score
+    best_state = (0, 0)
+
+    for r in range(1, R):
+        for h in range(1, H):
+            sub_or_cor_score = similarity_score_function(ref[r-1], hyp[h-1])
+            new_score = S[r-1][h-1].score + sub_or_cor_score
+            if new_score >= S[r][h].score:
+                S[r][h].score = new_score
+                S[r][h].prev_r = r-1
+                S[r][h].prev_h = h-1
+
+            del_score = deletion_score_function(ref[r-1])
+            new_score = S[r-1][h].score + del_score
+            if new_score >= S[r][h].score:
+                S[r][h].score = new_score
+                S[r][h].prev_r = r - 1
+                S[r][h].prev_h = h
+
+            ins_score = insertion_score_function(hyp[h-1])
+            new_score = S[r][h-1].score + ins_score
+            if new_score >= S[r][h].score:
+                S[r][h].score = new_score
+                S[r][h].prev_r = r
+                S[r][h].prev_h = h-1
+
+    best_score = S[R-1][H-1].score
+    best_state = (R-1, H-1)
+
+    if DEBUG:
+        print_search_grid(S, R, H, sys.stderr)
+
+    # Backtracing best alignment path, i.e. a list of arcs
+    # arc = (src, dst, ref, hyp, edit_type)
+    # src/dst = (r, h), where r/h refers to search grid state-id along Ref/Hyp axis
+    best_path = []
+    r, h = best_state[0], best_state[1]
+    prev_r, prev_h = S[r][h].prev_r, S[r][h].prev_h
+    score = S[r][h].score
+    # loop invariant:
+    #   1. (prev_r, prev_h) -> (r, h) is a "forward arc" on best alignment path
+    #   2. score is the value of point(r, h) on DP search grid
+    while prev_r != None or prev_h != None:
+        src = (prev_r, prev_h)
+        dst = (r, h)
+        if (r == prev_r + 1 and h == prev_h + 1): # Substitution or correct
+            arc = AlignmentArc(src, dst, ref[prev_r], hyp[prev_h])
+        elif (r == prev_r + 1 and h == prev_h): # Deletion
+            arc = AlignmentArc(src, dst, ref[prev_r], None)
+        elif (r == prev_r and h == prev_h + 1): # Insertion
+            arc = AlignmentArc(src, dst, None, hyp[prev_h])
+        else:
+            raise RuntimeError
+        best_path.append(arc)
+        r, h = prev_r, prev_h
+        prev_r, prev_h = S[r][h].prev_r, S[r][h].prev_h
+        score = S[r][h].score
+    
+    best_path.reverse()
+    return (best_path, best_score)
+
+def PrettyPrintAlignment(alignment, stream = sys.stderr):
+    def get_token_str(token):
+        if token == None:
+            return "*"
+        return token
+    
+    def is_double_width_char(ch):
+        if (ch >= '\u4e00') and (ch <= '\u9fa5'): # codepoint ranges for Chinese chars
+            return True
+        # TODO: support other double-width-char language such as Japanese, Korean 
+        else:
+            return False
+    
+    def display_width(token_str):
+        m = 0
+        for c in token_str:
+            if is_double_width_char(c):
+                m += 2
+            else:
+                m += 1
+        return m
+
+    R = '  REF  : '
+    H = '  HYP  : '
+    E = '  EDIT : '
+    for arc in alignment:
+        r = get_token_str(arc.ref)
+        h = get_token_str(arc.hyp)
+        e = arc.edit_type if arc.edit_type != 'C' else ''
+
+        nr, nh, ne = display_width(r), display_width(h), display_width(e)
+        n = max(nr, nh, ne) + 1
+
+        R += r + ' ' * (n-nr)
+        H += h + ' ' * (n-nh)
+        E += e + ' ' * (n-ne)
+
+    print(R, file=stream)
+    print(H, file=stream)
+    print(E, file=stream)
+
+def CountEdits(alignment):
+    c, s, i, d = 0, 0, 0, 0
+    for arc in alignment:
+        if arc.edit_type == 'C':
+            c += 1
+        elif arc.edit_type == 'S':
+            s += 1
+        elif arc.edit_type == 'I':
+            i += 1
+        elif arc.edit_type == 'D':
+            d += 1
+        else:
+            raise RuntimeError
+    return (c, s, i, d)
+
+def ComputeTokenErrorRate(c, s, i, d):
+    return 100.0 * (s + d + i) / (s + d + c)
+
+def ComputeSentenceErrorRate(num_err_utts, num_utts):
+    assert(num_utts != 0)
+    return 100.0 * num_err_utts / num_utts
+
+
+class EvaluationResult:
+    def __init__(self):
+        self.num_ref_utts = 0
+        self.num_hyp_utts = 0
+        self.num_eval_utts = 0 # seen in both ref & hyp
+        self.num_hyp_without_ref = 0
+
+        self.C = 0
+        self.S = 0
+        self.I = 0
+        self.D = 0
+        self.token_error_rate = 0.0
+
+        self.num_utts_with_error = 0
+        self.sentence_error_rate = 0.0
+    
+    def to_json(self):
+        return json.dumps(self.__dict__)
+    
+    def to_kaldi(self):
+        info = (
+            F'%WER {self.token_error_rate:.2f} [ {self.S + self.D + self.I} / {self.C + self.S + self.D}, {self.I} ins, {self.D} del, {self.S} sub ]\n'
+            F'%SER {self.sentence_error_rate:.2f} [ {self.num_utts_with_error} / {self.num_eval_utts} ]\n'
+        )
+        return info
+    
+    def to_sclite(self):
+        return "TODO"
+    
+    def to_espnet(self):
+        return "TODO"
+    
+    def to_summary(self):
+        #return json.dumps(self.__dict__, indent=4)
+        summary = (
+            '==================== Overall Statistics ====================\n'
+            F'num_ref_utts: {self.num_ref_utts}\n'
+            F'num_hyp_utts: {self.num_hyp_utts}\n'
+            F'num_hyp_without_ref: {self.num_hyp_without_ref}\n'
+            F'num_eval_utts: {self.num_eval_utts}\n'
+            F'sentence_error_rate: {self.sentence_error_rate:.2f}%\n'
+            F'token_error_rate: {self.token_error_rate:.2f}%\n'
+            F'token_stats:\n'
+            F'  - tokens:{self.C + self.S + self.D:>7}\n'
+            F'  - edits: {self.S + self.I + self.D:>7}\n'
+            F'  - cor:   {self.C:>7}\n'
+            F'  - sub:   {self.S:>7}\n'
+            F'  - ins:   {self.I:>7}\n'
+            F'  - del:   {self.D:>7}\n'
+            '============================================================\n'
+        )
+        return summary
+
+
+class Utterance:
+    def __init__(self, uid, text):
+        self.uid = uid
+        self.text = text
+
+
+def LoadUtterances(filepath, format):
+    utts = {}
+    if format == 'text': # utt_id word1 word2 ...
+        with open(filepath, 'r', encoding='utf8') as f:
+            for line in f:
+                line = line.strip()
+                if line:
+                    cols = line.split(maxsplit=1)
+                    assert(len(cols) == 2 or len(cols) == 1)
+                    uid = cols[0]
+                    text = cols[1] if len(cols) == 2 else ''
+                    if utts.get(uid) != None:
+                        raise RuntimeError(F'Found duplicated utterence id {uid}')
+                    utts[uid] = Utterance(uid, text)
+    else:
+        raise RuntimeError(F'Unsupported text format {format}')
+    return utts
+
+
+def tokenize_text(text, tokenizer):
+    if tokenizer == 'whitespace':
+        return text.split()
+    elif tokenizer == 'char':
+        return [ ch for ch in ''.join(text.split()) ]
+    else:
+        raise RuntimeError(F'ERROR: Unsupported tokenizer {tokenizer}')
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    # optional
+    parser.add_argument('--tokenizer', choices=['whitespace', 'char'], default='whitespace', help='whitespace for WER, char for CER')
+    parser.add_argument('--ref-format', choices=['text'], default='text', help='reference format, first col is utt_id, the rest is text')
+    parser.add_argument('--hyp-format', choices=['text'], default='text', help='hypothesis format, first col is utt_id, the rest is text')
+    # required
+    parser.add_argument('--ref', type=str, required=True, help='input reference file')
+    parser.add_argument('--hyp', type=str, required=True, help='input hypothesis file')
+
+    parser.add_argument('result_file', type=str)
+    args = parser.parse_args()
+    logging.info(args)
+
+    ref_utts = LoadUtterances(args.ref, args.ref_format)
+    hyp_utts = LoadUtterances(args.hyp, args.hyp_format)
+
+    r = EvaluationResult()
+
+    # check valid utterances in hyp that have matched non-empty reference
+    eval_utts = []
+    r.num_hyp_without_ref = 0
+    for uid in sorted(hyp_utts.keys()):
+        if uid in ref_utts.keys(): # TODO: efficiency
+            if ref_utts[uid].text.strip(): # non-empty reference
+                eval_utts.append(uid)
+            else:
+                logging.warn(F'Found {uid} with empty reference, skipping...')
+        else:
+            logging.warn(F'Found {uid} without reference, skipping...')
+            r.num_hyp_without_ref += 1
+
+    r.num_hyp_utts = len(hyp_utts)
+    r.num_ref_utts = len(ref_utts)
+    r.num_eval_utts = len(eval_utts)
+
+    with open(args.result_file, 'w+', encoding='utf8') as fo:
+        for uid in eval_utts:
+            ref = ref_utts[uid]
+            hyp = hyp_utts[uid]
+
+            alignment, score = EditDistance(
+                tokenize_text(ref.text, args.tokenizer),
+                tokenize_text(hyp.text, args.tokenizer)
+            )
+            
+            c, s, i, d = CountEdits(alignment)
+            utt_ter = ComputeTokenErrorRate(c, s, i, d)
+
+            # utt-level evaluation result
+            print(F'{{"uid":{uid}, "score":{score}, "ter":{utt_ter:.2f}, "cor":{c}, "sub":{s}, "ins":{i}, "del":{d}}}', file=fo)
+            PrettyPrintAlignment(alignment, fo)
+
+            r.C += c
+            r.S += s
+            r.I += i
+            r.D += d
+
+            if utt_ter > 0:
+                r.num_utts_with_error += 1
+
+        # corpus level evaluation result
+        r.sentence_error_rate = ComputeSentenceErrorRate(r.num_utts_with_error, r.num_eval_utts)
+        r.token_error_rate = ComputeTokenErrorRate(r.C, r.S, r.I, r.D)
+
+        print(r.to_summary(), file=fo)
+
+    print(r.to_json())
+    print(r.to_kaldi())
--- a/utils/speechio/glm_en.csv
+++ b/utils/speechio/glm_en.csv
@@ -0,0 +1,744 @@
+I'M,I AM
+I'LL,I WILL
+I'D,I HAD
+I'VE,I HAVE
+I WOULD'VE,I'D HAVE
+YOU'RE,YOU ARE
+YOU'LL,YOU WILL
+YOU'D,YOU WOULD
+YOU'VE,YOU HAVE
+HE'S,HE IS,HE WAS
+HE'LL,HE WILL
+HE'D,HE HAD
+SHE'S,SHE IS,SHE WAS
+SHE'LL,SHE WILL
+SHE'D,SHE HAD
+IT'S,IT IS,IT WAS
+IT'LL,IT WILL
+WE'RE,WE ARE,WE WERE
+WE'LL,WE WILL
+WE'D,WE WOULD
+WE'VE,WE HAVE
+WHO'LL,WHO WILL
+THEY'RE,THEY ARE
+THEY'LL,THEY WILL
+THAT'S,THAT IS,THAT WAS
+THAT'LL,THAT WILL
+HERE'S,HERE IS,HERE WAS
+THERE'S,THERE IS,THERE WAS
+WHERE'S,WHERE IS,WHERE WAS
+WHAT'S,WHAT IS,WHAT WAS
+LET'S,LET US
+WHO'S,WHO IS
+ONE'S,ONE IS
+THERE'LL,THERE WILL
+SOMEBODY'S,SOMEBODY IS
+EVERYBODY'S,EVERYBODY IS
+WOULD'VE,WOULD HAVE
+CAN'T,CANNOT,CAN NOT
+HADN'T,HAD NOT
+HASN'T,HAS NOT
+HAVEN'T,HAVE NOT
+ISN'T,IS NOT
+AREN'T,ARE NOT
+WON'T,WILL NOT
+WOULDN'T,WOULD NOT
+SHOULDN'T,SHOULD NOT
+DON'T,DO NOT
+DIDN'T,DID NOT
+GOTTA,GOT TO
+GONNA,GOING TO
+WANNA,WANT TO
+LEMME,LET ME
+GIMME,GIVE ME
+DUNNO,DON'T KNOW
+GOTCHA,GOT YOU
+KINDA,KIND OF
+MYSELF,MY SELF
+YOURSELF,YOUR SELF
+HIMSELF,HIM SELF
+HERSELF,HER SELF
+ITSELF,IT SELF
+OURSELVES,OUR SELVES
+OKAY,OK,O K
+Y'ALL,YALL,YOU ALL
+'CAUSE,'COS,CUZ,BECAUSE
+FUCKIN',FUCKING
+KILLING,KILLIN'
+EVERYDAY,EVERY DAY
+DOCTOR,DR,DR.
+MRS,MISSES,MISSUS
+MR,MR.,MISTER
+SR,SR.,SENIOR
+JR,JR.,JUNIOR
+ST,ST.,SAINT
+VOL,VOL.,VOLUME
+CM,CENTIMETER,CENTIMETRE
+MM,MILLIMETER,MILLIMETRE
+KM,KILOMETER,KILOMETRE
+KB,KILOBYTES,KILO BYTES,K B
+MB,MEGABYTES,MEGA BYTES
+GB,GIGABYTES,GIGA BYTES,G B
+THOUSAND,THOUSAND AND
+HUNDRED,HUNDRED AND
+A HUNDRED,ONE HUNDRED
+TWO THOUSAND AND,TWENTY,TWO THOUSAND
+STORYTELLER,STORY TELLER
+TSHIRT,T SHIRT
+TSHIRTS,T SHIRTS
+LEUKAEMIA,LEUKEMIA
+OESTROGEN,ESTROGEN
+ACKNOWLEDGMENT,ACKNOWLEDGEMENT
+JUDGMENT,JUDGEMENT
+MAMMA,MAMA
+DINING,DINNING
+FLACK,FLAK
+LEARNT,LEARNED
+BLONDE,BLOND
+JUMPSTART,JUMP START
+RIGHTNOW,RIGHT NOW
+EVERYONE,EVERY ONE
+NAME'S,NAME IS
+FAMILY'S,FAMILY IS
+COMPANY'S,COMPANY HAS
+GRANDKID,GRAND KID
+GRANDKIDS,GRAND KIDS
+MEALTIMES,MEAL TIMES
+ALRIGHT,ALL RIGHT
+GROWNUP,GROWN UP
+GROWNUPS,GROWN UPS
+SCHOOLDAYS,SCHOOL DAYS
+SCHOOLCHILDREN,SCHOOL CHILDREN
+CASEBOOK,CASE BOOK
+HUNGOVER,HUNG OVER
+HANDCLAPS,HAND CLAPS
+HANDCLAP,HAND CLAP
+HEATWAVE,HEAT WAVE
+ADDON,ADD ON
+ONTO,ON TO
+INTO,IN TO
+GOTO,GO TO
+GUNSHOT,GUN SHOT
+MOTHERFUCKER,MOTHER FUCKER
+OFTENTIMES,OFTEN TIMES
+SARTRE'S,SARTRE IS
+NONSTARTER,NON STARTER
+NONSTARTERS,NON STARTERS
+LONGTIME,LONG TIME
+POLICYMAKERS,POLICY MAKERS
+ANYMORE,ANY MORE
+CANADA'S,CANADA IS
+CELLPHONE,CELL PHONE
+WORKPLACE,WORK PLACE
+UNDERESTIMATING,UNDER ESTIMATING
+CYBERSECURITY,CYBER SECURITY
+NORTHEAST,NORTH EAST
+ANYTIME,ANY TIME
+LIVESTREAM,LIVE STREAM
+LIVESTREAMS,LIVE STREAMS
+WEBCAM,WEB CAM
+EMAIL,E MAIL
+ECAM,E CAM
+VMIX,V MIX
+SETUP,SET UP
+SMARTPHONE,SMART PHONE
+MULTICASTING,MULTI CASTING
+CHITCHAT,CHIT CHAT
+SEMIFINAL,SEMI FINAL
+SEMIFINALS,SEMI FINALS
+BBQ,BARBECUE
+STORYLINE,STORY LINE
+STORYLINES,STORY LINES
+BRO,BROTHER
+BROS,BROTHERS
+OVERPROTECTIIVE,OVER PROTECTIVE
+TIMEOUT,TIME OUT
+ADVISOR,ADVISER
+TIMBERWOLVES,TIMBER WOLVES
+WEBPAGE,WEB PAGE
+NEWCOMER,NEW COMER
+DELMAR,DEL MAR
+NETPLAY,NET PLAY
+STREETSIDE,STREET SIDE
+COLOURED,COLORED
+COLOURFUL,COLORFUL
+O,ZERO
+ETCETERA,ET CETERA
+FUNDRAISING,FUND RAISING
+RAINFOREST,RAIN FOREST
+BREATHTAKING,BREATH TAKING
+WIKIPAGE,WIKI PAGE
+OVERTIME,OVER TIME
+TRAIN'S TRAIN IS
+ANYONE,ANY ONE
+PHYSIOTHERAPY,PHYSIO THERAPY
+ANYBODY,ANY BODY
+BOTTLECAPS,BOTTLE CAPS
+BOTTLECAP,BOTTLE CAP
+STEPFATHER'S,STEP FATHER'S
+STEPFATHER,STEP FATHER
+WARTIME,WAR TIME
+SCREENSHOT,SCREEN SHOT
+TIMELINE,TIME LINE
+CITY'S,CITY IS
+NONPROFIT,NON PROFIT
+KPOP,K POP
+HOMEBASE,HOME BASE
+LIFELONG,LIFE LONG
+LAWSUITS,LAW SUITS
+MULTIBILLION,MULTI BILLION
+ROADMAP,ROAD MAP
+GUY'S,GUY IS
+CHECKOUT,CHECK OUT
+SQUARESPACE,SQUARE SPACE
+REDLINING,RED LINING
+BASE'S,BASE IS
+TAKEAWAY,TAKE AWAY
+CANDYLAND,CANDY LAND
+ANTISOCIAL,ANTI SOCIAL
+CASEWORK,CASE WORK
+RIGOR,RIGOUR
+ORGANIZATIONS,ORGANISATIONS
+ORGANIZATION,ORGANISATION
+SIGNPOST,SIGN POST
+WWII,WORLD WAR TWO
+WINDOWPANE,WINDOW PANE
+SUREFIRE,SURE FIRE
+MOUNTAINTOP,MOUNTAIN TOP
+SALESPERSON,SALES PERSON
+NETWORK,NET WORK
+MINISERIES,MINI SERIES
+EDWARDS'S,EDWARDS IS
+INTERSUBJECTIVITY,INTER SUBJECTIVITY
+LIBERALISM'S,LIBERALISM IS
+TAGLINE,TAG LINE
+SHINETHEORY,SHINE THEORY
+CALLYOURGIRLFRIEND,CALL YOUR GIRLFRIEND
+STARTUP,START UP
+BREAKUP,BREAK UP
+RADIOTOPIA,RADIO TOPIA
+HEARTBREAKING,HEART BREAKING
+AUTOIMMUNE,AUTO IMMUNE
+SINISE'S,SINISE IS
+KICKBACK,KICK BACK
+FOGHORN,FOG HORN
+BADASS,BAD ASS
+POWERAMERICAFORWARD,POWER AMERICA FORWARD
+GOOGLE'S,GOOGLE IS
+ROLEPLAY,ROLE PLAY
+PRICE'S,PRICE IS
+STANDOFF,STAND OFF
+FOREVER,FOR EVER
+GENERAL'S,GENERAL IS
+DOG'S,DOG IS
+AUDIOBOOK,AUDIO BOOK
+ANYWAY,ANY WAY
+PIGEONHOLE,PIEGON HOLE
+EGGSHELLS,EGG SHELLS
+VACCINE'S,VACCINE IS
+WORKOUT,WORK OUT
+ADMINISTRATOR'S,ADMINISTRATOR IS
+FUCKUP,FUCK UP
+RUNOFFS,RUN OFFS
+COLORWAY,COLOR WAY
+WAITLIST,WAIT LIST
+HEALTHCARE,HEALTH CARE
+TEXTBOOK,TEXT BOOK
+CALLBACK,CALL BACK
+PARTYGOERS,PARTY GOERS
+SOMEDAY,SOME DAY
+NIGHTGOWN,NIGHT GOWN
+STANDALONG,STAND ALONG
+BUSSINESSWOMAN,BUSSINESS WOMAN
+STORYTELLING,STORY TELLING
+MARKETPLACE,MARKET PLACE
+CRATEJOY,CRATE JOY
+OUTPERFORMED,OUT PERFORMED
+TRUEBOTANICALS,TRUE BOTANICALS
+NONFICTION,NON FICTION
+SPINOFF,SPIN OFF
+MOTHERFUCKING,MOTHER FUCKING
+TRACKLIST,TRACK LIST
+GODDAMN,GOD DAMN
+PORNHUB,PORN HUB
+UNDERAGE,UNDER AGE
+GOODBYE,GOOD BYE
+HARDCORE,HARD CORE
+TRUCK'S,TRUCK IS
+COUNTERSTEERING,COUNTER STEERING
+BUZZWORD,BUZZ WORD
+SUBCOMPONENTS,SUB COMPONENTS
+MOREOVER,MORE OVER
+PICKUP,PICK UP
+NEWSLETTER,NEWS LETTER
+KEYWORD,KEY WORD
+LOGIN,LOG IN
+TOOLBOX,TOOL BOX
+LINK'S,LINK IS
+PRIMIALVIDEO,PRIMAL VIDEO
+DOTNET,DOT NET
+AIRSTRIKE,AIR STRIKE
+HAIRSTYLE,HAIR STYLE
+TOWNSFOLK,TOWNS FOLK
+GOLDFISH,GOLD FISH
+TOM'S,TOM IS
+HOMETOWN,HOME TOWN
+CORONAVIRUS,CORONA VIRUS
+PLAYSTATION,PLAY STATION
+TOMORROW,TO MORROW
+TIMECONSUMING,TIME CONSUMING
+POSTWAR,POST WAR
+HANDSON,HANDS ON
+SHAKEUP,SHAKE UP
+ECOMERS,E COMERS
+COFOUNDER,CO FOUNDER
+HIGHEND,HIGH END
+INPERSON,IN PERSON
+GROWNUP,GROWN UP
+SELFREGULATION,SELF REGULATION
+INDEPTH,IN DEPTH
+ALLTIME,ALL TIME
+LONGTERM,LONG TERM
+SOCALLED,SO CALLED
+SELFCONFIDENCE,SELF CONFIDENCE
+STANDUP,STAND UP
+MINDBOGGLING,MIND BOGGLING
+BEINGFOROTHERS,BEING FOR OTHERS
+COWROTE,CO WROTE
+COSTARRED,CO STARRED
+EDITORINCHIEF,EDITOR IN CHIEF
+HIGHSPEED,HIGH SPEED
+DECISIONMAKING,DECISION MAKING
+WELLBEING,WELL BEING
+NONTRIVIAL,NON TRIVIAL
+PREEXISTING,PRE EXISTING
+STATEOWNED,STATE OWNED
+PLUGIN,PLUG IN
+PROVERSION,PRO VERSION
+OPTIN,OPT IN
+FOLLOWUP,FOLLOW UP
+FOLLOWUPS,FOLLOW UPS
+WIFI,WI FI
+THIRDPARTY,THIRD PARTY
+PROFESSIONALLOOKING,PROFESSIONAL LOOKING
+FULLSCREEN,FULL SCREEN
+BUILTIN,BUILT IN
+MULTISTREAM,MULTI STREAM
+LOWCOST,LOW COST
+RESTREAM,RE STREAM
+GAMECHANGER,GAME CHANGER
+WELLDEVELOPED,WELL DEVELOPED
+QUARTERINCH,QUARTER INCH
+FASTFASHION,FAST FASHION
+ECOMMERCE,E COMMERCE
+PRIZEWINNING,PRIZE WINNING
+NEVERENDING,NEVER ENDING
+MINDBLOWING,MIND BLOWING
+REALLIFE,REAL LIFE
+REOPEN,RE OPEN
+ONDEMAND,ON DEMAND
+PROBLEMSOLVING,PROBLEM SOLVING
+HEAVYHANDED,HEAVY HANDED
+OPENENDED,OPEN ENDED
+SELFCONTROL,SELF CONTROL
+WELLMEANING,WELL MEANING
+COHOST,CO HOST
+RIGHTSBASED,RIGHTS BASED
+HALFBROTHER,HALF BROTHER
+FATHERINLAW,FATHER IN LAW
+COAUTHOR,CO AUTHOR
+REELECTION,RE ELECTION
+SELFHELP,SELF HELP
+PROLIFE,PRO LIFE
+ANTIDUKE,ANTI DUKE
+POSTSTRUCTURALIST,POST STRUCTURALIST
+COFOUNDED,CO FOUNDED
+XRAY,X RAY
+ALLAROUND,ALL AROUND
+HIGHTECH,HIGH TECH
+TMOBILE,T MOBILE
+INHOUSE,IN HOUSE
+POSTMORTEM,POST MORTEM
+LITTLEKNOWN,LITTLE KNOWN
+FALSEPOSITIVE,FALSE POSITIVE
+ANTIVAXXER,ANTI VAXXER
+EMAILS,E MAILS
+DRIVETHROUGH,DRIVE THROUGH
+DAYTODAY,DAY TO DAY
+COSTAR,CO STAR
+EBAY,E BAY
+KOOLAID,KOOL AID
+ANTIDEMOCRATIC,ANTI DEMOCRATIC
+MIDDLEAGED,MIDDLE AGED
+SHORTLIVED,SHORT LIVED
+BESTSELLING,BEST SELLING
+TICTACS,TIC TACS
+UHHUH,UH HUH
+MULTITANK,MULTI TANK
+JAWDROPPING,JAW DROPPING
+LIVESTREAMING,LIVE STREAMING
+HARDWORKING,HARD WORKING
+BOTTOMDWELLING,BOTTOM DWELLING
+PRESHOW,PRE SHOW
+HANDSFREE,HANDS FREE
+TRICKORTREATING,TRICK OR TREATING
+PRERECORDED,PRE RECORDED
+DOGOODERS,DO GOODERS
+WIDERANGING,WIDE RANGING
+LIFESAVING,LIFE SAVING
+SKIREPORT,SKI REPORT
+SNOWBASE,SNOW BASE
+JAYZ,JAY Z
+SPIDERMAN,SPIDER MAN
+FREEKICK,FREE KICK
+EDWARDSHELAIRE,EDWARDS HELAIRE
+SHORTTERM,SHORT TERM
+HAVENOTS,HAVE NOTS
+SELFINTEREST,SELF INTEREST
+SELFINTERESTED,SELF INTERESTED
+SELFCOMPASSION,SELF COMPASSION
+MACHINELEARNING,MACHINE LEARNING
+COAUTHORED,CO AUTHORED
+NONGOVERNMENT,NON GOVERNMENT
+SUBSAHARAN,SUB SAHARAN
+COCHAIR,CO CHAIR
+LARGESCALE,LARGE SCALE
+VIDEOONDEMAND,VIDEO ON DEMAND
+FIRSTCLASS,FIRST CLASS
+COFOUNDERS,CO FOUNDERS
+COOP,CO OP
+PREORDERS,PRE ORDERS
+DOUBLEENTRY,DOUBLE ENTRY
+SELFCONFIDENT,SELF CONFIDENT
+SELFPORTRAIT,SELF PORTRAIT
+NONWHITE,NON WHITE
+ONBOARD,ON BOARD
+HALFLIFE,HALF LIFE
+ONCOURT,ON COURT
+SCIFI,SCI FI
+XMEN,X MEN
+DAYLEWIS,DAY LEWIS
+LALALAND,LA LA LAND
+AWARDWINNING,AWARD WINNING
+BOXOFFICE,BOX OFFICE
+TRIDACTYLS,TRI DACTYLS
+TRIDACTYL,TRI DACTYL
+MEDIUMSIZED,MEDIUM SIZED
+POSTSECONDARY,POST SECONDARY
+FULLTIME,FULL TIME
+GOKART,GO KART
+OPENAIR,OPEN AIR
+WELLKNOWN,WELL KNOWN
+ICECREAM,ICE CREAM
+EARTHMOON,EARTH MOON
+STATEOFTHEART,STATE OF THE ART
+BSIDE,B SIDE
+EASTWEST,EAST WEST
+ALLSTAR,ALL STAR
+RUNNERUP,RUNNER UP
+HORSEDRAWN,HORSE DRAWN
+OPENSOURCE,OPEN SOURCE
+PURPOSEBUILT,PURPOSE BUILT
+SQUAREFREE,SQUARE FREE
+PRESENTDAY,PRESENT DAY
+CANADAUNITED,CANADA UNITED
+HOTCHPOTCH,HOTCH POTCH
+LOWLYING,LOW LYING
+RIGHTHANDED,RIGHT HANDED
+PEARSHAPED,PEAR SHAPED
+BESTKNOWN,BEST KNOWN
+FULLLENGTH,FULL LENGTH
+YEARROUND,YEAR ROUND
+PREELECTION,PRE ELECTION
+RERECORD,RE RECORD
+MINIALBUM,MINI ALBUM
+LONGESTRUNNING,LONGEST RUNNING
+ALLIRELAND,ALL IRELAND
+NORTHWESTERN,NORTH WESTERN
+PARTTIME,PART TIME
+NONGOVERNMENTAL,NON GOVERNMENTAL
+ONLINE,ON LINE
+ONAIR,ON AIR
+NORTHSOUTH,NORTH SOUTH
+RERELEASED,RE RELEASED
+LEFTHANDED,LEFT HANDED
+BSIDES,B SIDES
+ANGLOSAXON,ANGLO SAXON
+SOUTHSOUTHEAST,SOUTH SOUTHEAST
+CROSSCOUNTRY,CROSS COUNTRY
+REBUILT,RE BUILT
+FREEFORM,FREE FORM
+SCOOBYDOO,SCOOBY DOO
+ATLARGE,AT LARGE
+COUNCILMANAGER,COUNCIL MANAGER
+LONGRUNNING,LONG RUNNING
+PREWAR,PRE WAR
+REELECTED,RE ELECTED
+HIGHSCHOOL,HIGH SCHOOL
+RUNNERSUP,RUNNERS UP
+NORTHWEST,NORTH WEST
+WEBBASED,WEB BASED
+HIGHQUALITY,HIGH QUALITY
+RIGHTWING,RIGHT WING
+LANEFOX,LANE FOX
+PAYPERVIEW,PAY PER VIEW
+COPRODUCTION,CO PRODUCTION
+NONPARTISAN,NON PARTISAN
+FIRSTPERSON,FIRST PERSON
+WORLDRENOWNED,WORLD RENOWNED
+VICEPRESIDENT,VICE PRESIDENT
+PROROMAN,PRO ROMAN
+COPRODUCED,CO PRODUCED
+LOWPOWER,LOW POWER
+SELFESTEEM,SELF ESTEEM
+SEMITRANSPARENT,SEMI TRANSPARENT
+SECONDINCOMMAND,SECOND IN COMMAND
+HIGHRISE,HIGH RISE
+COHOSTED,CO HOSTED
+AFRICANAMERICAN,AFRICAN AMERICAN
+SOUTHWEST,SOUTH WEST
+WELLPRESERVED,WELL PRESERVED
+FEATURELENGTH,FEATURE LENGTH
+HIPHOP,HIP HOP
+ALLBIG,ALL BIG
+SOUTHEAST,SOUTH EAST
+COUNTERATTACK,COUNTER ATTACK
+QUARTERFINALS,QUARTER FINALS
+STABLEDOOR,STABLE DOOR
+DARKEYED,DARK EYED
+ALLAMERICAN,ALL AMERICAN
+THIRDPERSON,THIRD PERSON
+LOWLEVEL,LOW LEVEL
+NTERMINAL,N TERMINAL
+DRIEDUP,DRIED UP
+AFRICANAMERICANS,AFRICAN AMERICANS
+ANTIAPARTHEID,ANTI APARTHEID
+STOKEONTRENT,STOKE ON TRENT
+NORTHNORTHEAST,NORTH NORTHEAST
+BRANDNEW,BRAND NEW
+RIGHTANGLED,RIGHT ANGLED
+GOVERNMENTOWNED,GOVERNMENT OWNED
+SONINLAW,SON IN LAW
+SUBJECTOBJECTVERB,SUBJECT OBJECT VERB
+LEFTARM,LEFT ARM
+LONGLIVED,LONG LIVED
+REDEYE,RED EYE
+TPOSE,T POSE
+NIGHTVISION,NIGHT VISION
+SOUTHEASTERN,SOUTH EASTERN
+WELLRECEIVED,WELL RECEIVED
+ALFAYOUM,AL FAYOUM
+TIMEBASED,TIME BASED
+KETTLEDRUMS,KETTLE DRUMS
+BRIGHTEYED,BRIGHT EYED
+REDBROWN,RED BROWN
+SAMESEX,SAME SEX
+PORTDEPAIX,PORT DE PAIX
+CLEANUP,CLEAN UP
+PERCENT,PERCENT SIGN
+TAKEOUT,TAKE OUT
+KNOWHOW,KNOW HOW
+FISHBONE,FISH BONE
+FISHSTICKS,FISH STICKS
+PAPERWORK,PAPER WORK
+NICKNACKS,NICK NACKS
+STREETTALKING,STREET TALKING
+NONACADEMIC,NON ACADEMIC
+SHELLY,SHELLEY
+SHELLY'S,SHELLEY'S
+JIMMY,JIMMIE
+JIMMY'S,JIMMIE'S
+DRUGSTORE,DRUG STORE
+THRU,THROUGH
+PLAYDATE,PLAY DATE
+MICROLIFE,MICRO LIFE
+SKILLSET,SKILL SET
+SKILLSETS,SKILL SETS
+TRADEOFF,TRADE OFF
+TRADEOFFS,TRADE OFFS
+ONSCREEN,ON SCREEN
+PLAYBACK,PLAY BACK
+ARTWORK,ART WORK
+COWORKER,CO WORDER
+COWORKERS,CO WORDERS
+SOMETIME,SOME TIME
+SOMETIMES,SOME TIMES
+CROWDFUNDING,CROWD FUNDING
+AM,A.M.,A M
+PM,P.M.,P M
+TV,T V
+MBA,M B A
+USA,U S A
+US,U S
+UK,U K
+CEO,C E O
+CFO,C F O
+COO,C O O
+CIO,C I O
+FM,F M
+GMC,G M C
+FSC,F S C
+NPD,N P D
+APM,A P M
+NGO,N G O
+TD,T D
+LOL,L O L
+IPO,I P O
+CNBC,C N B C
+IPOS,I P OS
+CNBC's,C N B C'S
+JT,J T
+NPR,N P R
+NPR'S,N P R'S
+MP,M P
+IOI,I O I
+DW,D W
+CNN,C N N
+WSM,W S M
+ET,E T
+IT,I T
+RJ,R J
+DVD,D V D
+DVD'S,D V D'S
+HBO,H B O
+LA,L A
+XC,X C
+SUV,S U V
+NBA,N B A
+NBA'S,N B A'S
+ESPN,E S P N
+ESPN'S,E S P N'S
+ADT,A D T
+HD,H D
+VIP,V I P
+TMZ,T M Z
+CBC,C B C
+NPO,N P O
+BBC,B B C
+LA'S,L A'S
+TMZ'S,T M Z'S
+HIV,H I V
+FTC,F T C
+EU,E U
+PHD,P H D
+AI,A I
+FHI,F H I
+ICML,I C M L
+ICLR,I C L R
+BMW,B M W
+EV,E V
+CR,C R
+API,A P I
+ICO,I C O
+LTE,L T E
+OBS,O B S
+PC,P C
+IO,I O
+CRM,C R M
+RTMP,R T M P
+ASMR,A S M R
+GG,G G
+WWW,W W W
+PEI,P E I
+JJ,J J
+PT,P T
+DJ,D J
+SD,S D
+POW,P.O.W.,P O W
+FYI,F Y I
+DC,D C,D.C
+ABC,A B C
+TJ,T J
+WMDT,W M D T
+WDTN,W D T N
+TY,T Y
+EJ,E J
+CJ,C J
+ACL,A C L
+UK'S,U K'S
+GTV,G T V
+MDMA,M D M A
+DFW,D F W
+WTF,W T F
+AJ,A J
+MD,M D
+PH,P H
+ID,I D
+SEO,S E O
+UTM'S,U T M'S
+EC,E C
+UFC,U F C
+RV,R V
+UTM,U T M
+CSV,C S V
+SMS,S M S
+GRB,G R B
+GT,G T
+LEM,L E M
+XR,X R
+EDU,E D U
+NBC,N B C
+EMS,E M S
+CDC,C D C
+MLK,M L K
+IE,I E
+OC,O C
+HR,H R
+MA,M A
+DEE,D E E
+AP,A P
+UFO,U F O
+DE,D E
+LGBTQ,L G B T Q
+PTA,P T A
+NHS,N H S
+CMA,C M A
+MGM,M G M
+AKA,A K A
+HW,H W
+GOP,G O P
+GOP'S,G O P'S
+FBI,F B I
+PRX,P R X
+CTO,C T O
+URL,U R L
+EIN,E I N
+MLS,M L S
+CSI,C S I
+AOC,A O C
+CND,C N D
+CP,C P
+PP,P P
+CLI,C L I
+PB,P B
+FDA,F D A
+MRNA,M R N A
+PR,P R
+VP,V P
+DNC,D N C
+MSNBC,M S N B C
+GQ,G Q
+UT,U T
+XXI,X X I
+HRV,H R V
+WHO,W H O
+CRO,C R O
+DPA,D P A
+PPE,P P E
+EVA,E V A
+BP,B P
+GPS,G P S
+AR,A R
+PJ,P J
+MLM,M L M
+OLED,O L E D
+BO,B O
+VE,V E
+UN,U N
+SLS,S L S
+DM,D M
+DM'S,D M'S
+ASAP,A S A P
+ETA,E T A
+DOB,D O B
+BMW,B M W
--- a/utils/speechio/interjections_en.csv
+++ b/utils/speechio/interjections_en.csv
@@ -0,0 +1,20 @@
+ach
+ah
+eee
+eh
+er
+ew
+ha
+hee
+hm
+hmm
+hmmm
+huh
+mm
+mmm
+oof
+uh
+uhh
+um
+oh
+hum
--- a/utils/speechio/nemo_text_processing/README.md
+++ b/utils/speechio/nemo_text_processing/README.md
@@ -0,0 +1 @@
+nemo_version from commit:eae1684f7f33c2a18de9ecfa42ec7db93d39e631
--- a/utils/speechio/nemo_text_processing/init.py
+++ b/utils/speechio/nemo_text_processing/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/README.md
+++ b/utils/speechio/nemo_text_processing/text_normalization/README.md
@@ -0,0 +1,10 @@
+# Text Normalization
+
+Text Normalization is part of NeMo's `nemo_text_processing` - a Python package that is installed with the `nemo_toolkit`. 
+It converts text from written form into its verbalized form, e.g. "123" -> "one hundred twenty three".
+
+See [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/text_normalization/wfst/wfst_text_normalization.html) for details.
+
+Tutorial with overview of the package capabilities: [Text_(Inverse)_Normalization.ipynb](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Text_(Inverse)_Normalization.ipynb)
+
+Tutorial on how to customize the underlying gramamrs: [WFST_Tutorial.ipynb](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/WFST_Tutorial.ipynb)
--- a/utils/speechio/nemo_text_processing/text_normalization/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/data_loader_utils.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/data_loader_utils.py
@@ -0,0 +1,350 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import json
+import re
+import string
+from collections import defaultdict, namedtuple
+from typing import Dict, List, Optional, Set, Tuple
+from unicodedata import category
+
+
+
+EOS_TYPE = "EOS"
+PUNCT_TYPE = "PUNCT"
+PLAIN_TYPE = "PLAIN"
+Instance = namedtuple('Instance', 'token_type un_normalized normalized')
+known_types = [
+    "PLAIN",
+    "DATE",
+    "CARDINAL",
+    "LETTERS",
+    "VERBATIM",
+    "MEASURE",
+    "DECIMAL",
+    "ORDINAL",
+    "DIGIT",
+    "MONEY",
+    "TELEPHONE",
+    "ELECTRONIC",
+    "FRACTION",
+    "TIME",
+    "ADDRESS",
+]
+
+
+def _load_kaggle_text_norm_file(file_path: str) -> List[Instance]:
+    """
+    https://www.kaggle.com/richardwilliamsproat/text-normalization-for-english-russian-and-polish
+    Loads text file in the Kaggle Google text normalization file format: <semiotic class>\t<unnormalized text>\t<`self` if trivial class or normalized text>
+    E.g. 
+    PLAIN   Brillantaisia   <self>
+    PLAIN   is      <self>
+    PLAIN   a       <self>
+    PLAIN   genus   <self>
+    PLAIN   of      <self>
+    PLAIN   plant   <self>
+    PLAIN   in      <self>
+    PLAIN   family  <self>
+    PLAIN   Acanthaceae     <self>
+    PUNCT   .       sil
+    <eos>   <eos>
+
+    Args:
+        file_path: file path to text file
+
+    Returns: flat list of instances 
+    """
+    res = []
+    with open(file_path, 'r') as fp:
+        for line in fp:
+            parts = line.strip().split("\t")
+            if parts[0] == "<eos>":
+                res.append(Instance(token_type=EOS_TYPE, un_normalized="", normalized=""))
+            else:
+                l_type, l_token, l_normalized = parts
+                l_token = l_token.lower()
+                l_normalized = l_normalized.lower()
+
+                if l_type == PLAIN_TYPE:
+                    res.append(Instance(token_type=l_type, un_normalized=l_token, normalized=l_token))
+                elif l_type != PUNCT_TYPE:
+                    res.append(Instance(token_type=l_type, un_normalized=l_token, normalized=l_normalized))
+    return res
+
+
+def load_files(file_paths: List[str], load_func=_load_kaggle_text_norm_file) -> List[Instance]:
+    """
+    Load given list of text files using the `load_func` function.
+
+    Args: 
+        file_paths: list of file paths
+        load_func: loading function
+
+    Returns: flat list of instances
+    """
+    res = []
+    for file_path in file_paths:
+        res.extend(load_func(file_path=file_path))
+    return res
+
+
+def clean_generic(text: str) -> str:
+    """
+    Cleans text without affecting semiotic classes.
+
+    Args:
+        text: string
+
+    Returns: cleaned string
+    """
+    text = text.strip()
+    text = text.lower()
+    return text
+
+
+def evaluate(preds: List[str], labels: List[str], input: Optional[List[str]] = None, verbose: bool = True) -> float:
+    """
+    Evaluates accuracy given predictions and labels. 
+
+    Args:
+        preds: predictions
+        labels: labels
+        input: optional, only needed for verbosity
+        verbose: if true prints [input], golden labels and predictions
+
+    Returns accuracy
+    """
+    acc = 0
+    nums = len(preds)
+    for i in range(nums):
+        pred_norm = clean_generic(preds[i])
+        label_norm = clean_generic(labels[i])
+        if pred_norm == label_norm:
+            acc = acc + 1
+        else:
+            if input:
+                print(f"inpu: {json.dumps(input[i])}")
+            print(f"gold: {json.dumps(label_norm)}")
+            print(f"pred: {json.dumps(pred_norm)}")
+    return acc / nums
+
+
+def training_data_to_tokens(
+    data: List[Instance], category: Optional[str] = None
+) -> Dict[str, Tuple[List[str], List[str]]]:
+    """
+    Filters the instance list by category if provided and converts it into a map from token type to list of un_normalized and normalized strings
+
+    Args:
+        data: list of instances
+        category: optional semiotic class category name
+
+    Returns Dict: token type -> (list of un_normalized strings, list of normalized strings)
+    """
+    result = defaultdict(lambda: ([], []))
+    for instance in data:
+        if instance.token_type != EOS_TYPE:
+            if category is None or instance.token_type == category:
+                result[instance.token_type][0].append(instance.un_normalized)
+                result[instance.token_type][1].append(instance.normalized)
+    return result
+
+
+def training_data_to_sentences(data: List[Instance]) -> Tuple[List[str], List[str], List[Set[str]]]:
+    """
+    Takes instance list, creates list of sentences split by EOS_Token
+    Args:
+        data: list of instances
+    Returns (list of unnormalized sentences, list of normalized sentences, list of sets of categories in a sentence)
+    """
+    # split data at EOS boundaries
+    sentences = []
+    sentence = []
+    categories = []
+    sentence_categories = set()
+
+    for instance in data:
+        if instance.token_type == EOS_TYPE:
+            sentences.append(sentence)
+            sentence = []
+            categories.append(sentence_categories)
+            sentence_categories = set()
+        else:
+            sentence.append(instance)
+            sentence_categories.update([instance.token_type])
+    un_normalized = [" ".join([instance.un_normalized for instance in sentence]) for sentence in sentences]
+    normalized = [" ".join([instance.normalized for instance in sentence]) for sentence in sentences]
+    return un_normalized, normalized, categories
+
+
+def post_process_punctuation(text: str) -> str:
+    """
+    Normalized quotes and spaces
+
+    Args:
+        text: text
+
+    Returns: text with normalized spaces and quotes
+    """
+    text = (
+        text.replace('( ', '(')
+        .replace(' )', ')')
+        .replace('{ ', '{')
+        .replace(' }', '}')
+        .replace('[ ', '[')
+        .replace(' ]', ']')
+        .replace('  ', ' ')
+        .replace('”', '"')
+        .replace("’", "'")
+        .replace("»", '"')
+        .replace("«", '"')
+        .replace("\\", "")
+        .replace("„", '"')
+        .replace("´", "'")
+        .replace("’", "'")
+        .replace('“', '"')
+        .replace("‘", "'")
+        .replace('`', "'")
+        .replace('- -', "--")
+    )
+
+    for punct in "!,.:;?":
+        text = text.replace(f' {punct}', punct)
+    return text.strip()
+
+
+def pre_process(text: str) -> str:
+    """
+    Optional text preprocessing before normalization (part of TTS TN pipeline)
+
+    Args:
+        text: string that may include semiotic classes
+
+    Returns: text with spaces around punctuation marks
+    """
+    space_both = '[]'
+    for punct in space_both:
+        text = text.replace(punct, ' ' + punct + ' ')
+
+    # remove extra space
+    text = re.sub(r' +', ' ', text)
+    return text
+
+
+def load_file(file_path: str) -> List[str]:
+    """
+    Loads given text file with separate lines into list of string.
+
+    Args: 
+        file_path: file path
+
+    Returns: flat list of string
+    """
+    res = []
+    with open(file_path, 'r') as fp:
+        for line in fp:
+            res.append(line)
+    return res
+
+
+def write_file(file_path: str, data: List[str]):
+    """
+    Writes out list of string to file.
+
+    Args:
+        file_path: file path
+        data: list of string
+        
+    """
+    with open(file_path, 'w') as fp:
+        for line in data:
+            fp.write(line + '\n')
+
+
+def post_process_punct(input: str, normalized_text: str, add_unicode_punct: bool = False):
+    """
+    Post-processing of the normalized output to match input in terms of spaces around punctuation marks.
+    After NN normalization, Moses detokenization puts a space after
+    punctuation marks, and attaches an opening quote "'" to the word to the right.
+    E.g., input to the TN NN model is "12 test' example",
+    after normalization and detokenization -> "twelve test 'example" (the quote is considered to be an opening quote,
+    but it doesn't match the input and can cause issues during TTS voice generation.)
+    The current function will match the punctuation and spaces of the normalized text with the input sequence.
+    "12 test' example" -> "twelve test 'example" -> "twelve test' example" (the quote was shifted to match the input).
+
+    Args:
+        input: input text (original input to the NN, before normalization or tokenization)
+        normalized_text: output text (output of the TN NN model)
+        add_unicode_punct: set to True to handle unicode punctuation marks as well as default string.punctuation (increases post processing time)
+    """
+    # in the post-processing WFST graph "``" are repalced with '"" quotes (otherwise single quotes "`" won't be handled correctly)
+    # this function fixes spaces around them based on input sequence, so here we're making the same double quote replacement
+    # to make sure these new double quotes work with this function
+    if "``" in input and "``" not in normalized_text:
+        input = input.replace("``", '"')
+    input = [x for x in input]
+    normalized_text = [x for x in normalized_text]
+    punct_marks = [x for x in string.punctuation if x in input]
+
+    if add_unicode_punct:
+        punct_unicode = [
+            chr(i)
+            for i in range(sys.maxunicode)
+            if category(chr(i)).startswith("P") and chr(i) not in punct_default and chr(i) in input
+        ]
+        punct_marks = punct_marks.extend(punct_unicode)
+
+    for punct in punct_marks:
+        try:
+            equal = True
+            if input.count(punct) != normalized_text.count(punct):
+                equal = False
+            idx_in, idx_out = 0, 0
+            while punct in input[idx_in:]:
+                idx_out = normalized_text.index(punct, idx_out)
+                idx_in = input.index(punct, idx_in)
+
+                def _is_valid(idx_out, idx_in, normalized_text, input):
+                    """Check if previous or next word match (for cases when punctuation marks are part of
+                    semiotic token, i.e. some punctuation can be missing in the normalized text)"""
+                    return (idx_out > 0 and idx_in > 0 and normalized_text[idx_out - 1] == input[idx_in - 1]) or (
+                        idx_out < len(normalized_text) - 1
+                        and idx_in < len(input) - 1
+                        and normalized_text[idx_out + 1] == input[idx_in + 1]
+                    )
+
+                if not equal and not _is_valid(idx_out, idx_in, normalized_text, input):
+                    idx_in += 1
+                    continue
+                if idx_in > 0 and idx_out > 0:
+                    if normalized_text[idx_out - 1] == " " and input[idx_in - 1] != " ":
+                        normalized_text[idx_out - 1] = ""
+
+                    elif normalized_text[idx_out - 1] != " " and input[idx_in - 1] == " ":
+                        normalized_text[idx_out - 1] += " "
+
+                if idx_in < len(input) - 1 and idx_out < len(normalized_text) - 1:
+                    if normalized_text[idx_out + 1] == " " and input[idx_in + 1] != " ":
+                        normalized_text[idx_out + 1] = ""
+                    elif normalized_text[idx_out + 1] != " " and input[idx_in + 1] == " ":
+                        normalized_text[idx_out] = normalized_text[idx_out] + " "
+                idx_out += 1
+                idx_in += 1
+        except:
+            pass
+
+    normalized_text = "".join(normalized_text)
+    return re.sub(r' +', ' ', normalized_text)
--- a/utils/speechio/nemo_text_processing/text_normalization/en/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/init.py
@@ -0,0 +1,17 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from nemo_text_processing.text_normalization.en.taggers.tokenize_and_classify import ClassifyFst
+from nemo_text_processing.text_normalization.en.verbalizers.verbalize import VerbalizeFst
+from nemo_text_processing.text_normalization.en.verbalizers.verbalize_final import VerbalizeFinalFst
--- a/utils/speechio/nemo_text_processing/text_normalization/en/clean_eval_data.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/clean_eval_data.py
@@ -0,0 +1,342 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from argparse import ArgumentParser
+from typing import List
+
+import regex as re
+from nemo_text_processing.text_normalization.data_loader_utils import (
+    EOS_TYPE,
+    Instance,
+    load_files,
+    training_data_to_sentences,
+)
+
+
+"""
+This file is for evaluation purposes.
+filter_loaded_data() cleans data (list of instances) for text normalization. Filters and cleaners can be specified for each semiotic class individually.
+For example, normalized text should only include characters and whitespace characters but no punctuation. 
+            Cardinal unnormalized instances should contain at least one integer and all other characters are removed.
+"""
+
+
+class Filter:
+    """
+    Filter class
+
+    Args:
+        class_type: semiotic class used in dataset
+        process_func: function to transform text
+        filter_func:  function to filter text
+
+    """
+
+    def __init__(self, class_type: str, process_func: object, filter_func: object):
+        self.class_type = class_type
+        self.process_func = process_func
+        self.filter_func = filter_func
+
+    def filter(self, instance: Instance) -> bool:
+        """
+        filter function
+
+        Args:
+            filters given instance with filter function
+
+        Returns: True if given instance fulfills criteria or does not belong to class type
+        """
+        if instance.token_type != self.class_type:
+            return True
+        return self.filter_func(instance)
+
+    def process(self, instance: Instance) -> Instance:
+        """
+        process function
+
+        Args:
+            processes given instance with process function
+            
+        Returns: processed instance if instance belongs to expected class type or original instance
+        """
+        if instance.token_type != self.class_type:
+            return instance
+        return self.process_func(instance)
+
+
+def filter_cardinal_1(instance: Instance) -> bool:
+    ok = re.search(r"[0-9]", instance.un_normalized)
+    return ok
+
+
+def process_cardinal_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    un_normalized = re.sub(r"[^0-9]", "", un_normalized)
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_ordinal_1(instance: Instance) -> bool:
+    ok = re.search(r"(st|nd|rd|th)\s*$", instance.un_normalized)
+    return ok
+
+
+def process_ordinal_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    un_normalized = re.sub(r"[,\s]", "", un_normalized)
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_decimal_1(instance: Instance) -> bool:
+    ok = re.search(r"[0-9]", instance.un_normalized)
+    return ok
+
+
+def process_decimal_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    un_normalized = re.sub(r",", "", un_normalized)
+    normalized = instance.normalized
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_measure_1(instance: Instance) -> bool:
+    ok = True
+    return ok
+
+
+def process_measure_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    un_normalized = re.sub(r",", "", un_normalized)
+    un_normalized = re.sub(r"m2", "m²", un_normalized)
+    un_normalized = re.sub(r"(\d)([^\d.\s])", r"\1 \2", un_normalized)
+    normalized = re.sub(r"[^a-z\s]", "", normalized)
+    normalized = re.sub(r"per ([a-z\s]*)s$", r"per \1", normalized)
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_money_1(instance: Instance) -> bool:
+    ok = re.search(r"[0-9]", instance.un_normalized)
+    return ok
+
+
+def process_money_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    un_normalized = re.sub(r",", "", un_normalized)
+    un_normalized = re.sub(r"a\$", r"$", un_normalized)
+    un_normalized = re.sub(r"us\$", r"$", un_normalized)
+    un_normalized = re.sub(r"(\d)m\s*$", r"\1 million", un_normalized)
+    un_normalized = re.sub(r"(\d)bn?\s*$", r"\1 billion", un_normalized)
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_time_1(instance: Instance) -> bool:
+    ok = re.search(r"[0-9]", instance.un_normalized)
+    return ok
+
+
+def process_time_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    un_normalized = re.sub(r": ", ":", un_normalized)
+    un_normalized = re.sub(r"(\d)\s?a\s?m\s?", r"\1 a.m.", un_normalized)
+    un_normalized = re.sub(r"(\d)\s?p\s?m\s?", r"\1 p.m.", un_normalized)
+    normalized = instance.normalized
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_plain_1(instance: Instance) -> bool:
+    ok = True
+    return ok
+
+
+def process_plain_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_punct_1(instance: Instance) -> bool:
+    ok = True
+    return ok
+
+
+def process_punct_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_date_1(instance: Instance) -> bool:
+    ok = True
+    return ok
+
+
+def process_date_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    un_normalized = re.sub(r",", "", un_normalized)
+    normalized = instance.normalized
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_letters_1(instance: Instance) -> bool:
+    ok = True
+    return ok
+
+
+def process_letters_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_verbatim_1(instance: Instance) -> bool:
+    ok = True
+    return ok
+
+
+def process_verbatim_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_digit_1(instance: Instance) -> bool:
+    ok = re.search(r"[0-9]", instance.un_normalized)
+    return ok
+
+
+def process_digit_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_telephone_1(instance: Instance) -> bool:
+    ok = re.search(r"[0-9]", instance.un_normalized)
+    return ok
+
+
+def process_telephone_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_electronic_1(instance: Instance) -> bool:
+    ok = re.search(r"[0-9]", instance.un_normalized)
+    return ok
+
+
+def process_electronic_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_fraction_1(instance: Instance) -> bool:
+    ok = re.search(r"[0-9]", instance.un_normalized)
+    return ok
+
+
+def process_fraction_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+def filter_address_1(instance: Instance) -> bool:
+    ok = True
+    return ok
+
+
+def process_address_1(instance: Instance) -> Instance:
+    un_normalized = instance.un_normalized
+    normalized = instance.normalized
+    normalized = re.sub(r"[^a-z ]", "", normalized)
+    return Instance(token_type=instance.token_type, un_normalized=un_normalized, normalized=normalized)
+
+
+filters = []
+filters.append(Filter(class_type="CARDINAL", process_func=process_cardinal_1, filter_func=filter_cardinal_1))
+filters.append(Filter(class_type="ORDINAL", process_func=process_ordinal_1, filter_func=filter_ordinal_1))
+filters.append(Filter(class_type="DECIMAL", process_func=process_decimal_1, filter_func=filter_decimal_1))
+filters.append(Filter(class_type="MEASURE", process_func=process_measure_1, filter_func=filter_measure_1))
+filters.append(Filter(class_type="MONEY", process_func=process_money_1, filter_func=filter_money_1))
+filters.append(Filter(class_type="TIME", process_func=process_time_1, filter_func=filter_time_1))
+
+filters.append(Filter(class_type="DATE", process_func=process_date_1, filter_func=filter_date_1))
+filters.append(Filter(class_type="PLAIN", process_func=process_plain_1, filter_func=filter_plain_1))
+filters.append(Filter(class_type="PUNCT", process_func=process_punct_1, filter_func=filter_punct_1))
+filters.append(Filter(class_type="LETTERS", process_func=process_letters_1, filter_func=filter_letters_1))
+filters.append(Filter(class_type="VERBATIM", process_func=process_verbatim_1, filter_func=filter_verbatim_1))
+filters.append(Filter(class_type="DIGIT", process_func=process_digit_1, filter_func=filter_digit_1))
+filters.append(Filter(class_type="TELEPHONE", process_func=process_telephone_1, filter_func=filter_telephone_1))
+filters.append(Filter(class_type="ELECTRONIC", process_func=process_electronic_1, filter_func=filter_electronic_1))
+filters.append(Filter(class_type="FRACTION", process_func=process_fraction_1, filter_func=filter_fraction_1))
+filters.append(Filter(class_type="ADDRESS", process_func=process_address_1, filter_func=filter_address_1))
+filters.append(Filter(class_type=EOS_TYPE, process_func=lambda x: x, filter_func=lambda x: True))
+
+
+def filter_loaded_data(data: List[Instance], verbose: bool = False) -> List[Instance]:
+    """
+    Filters list of instances
+
+    Args:
+        data: list of instances
+
+    Returns: filtered and transformed list of instances
+    """
+    updates_instances = []
+    for instance in data:
+        updated_instance = False
+        for fil in filters:
+            if fil.class_type == instance.token_type and fil.filter(instance):
+                instance = fil.process(instance)
+                updated_instance = True
+        if updated_instance:
+            if verbose:
+                print(instance)
+            updates_instances.append(instance)
+    return updates_instances
+
+
+def parse_args():
+    parser = ArgumentParser()
+    parser.add_argument("--input", help="input file path", type=str, default='./en_with_types/output-00001-of-00100')
+    parser.add_argument("--verbose", help="print filtered instances", action='store_true')
+    return parser.parse_args()
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    file_path = args.input
+
+    print("Loading training data: " + file_path)
+    instance_list = load_files([file_path])  # List of instances
+    filtered_instance_list = filter_loaded_data(instance_list, args.verbose)
+    training_data_to_sentences(filtered_instance_list)
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/address/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/address/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/address/address_word.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/address/address_word.tsv
@@ -0,0 +1,14 @@
+st	Street
+street	Street
+expy	Expressway
+fwy	Freeway
+hwy	Highway
+dr	Drive
+ct	Court
+ave	Avenue
+av	Avenue
+cir	Circle
+blvd	Boulevard
+alley	Alley
+way	Way
+jct	Junction
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/address/state.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/address/state.tsv
@@ -0,0 +1,52 @@
+Alabama	AL
+Alaska	AK
+Arizona	AZ
+Arkansas	AR
+California	CA
+Colorado	CO
+Connecticut	CT
+Delaware	DE
+Florida	FL
+Georgia	GA
+Hawaii	HI
+Idaho	ID
+Illinois	IL
+Indiana	IN
+Indiana	IND
+Iowa	IA
+Kansas	KS
+Kentucky	KY
+Louisiana	LA
+Maine	ME
+Maryland	MD
+Massachusetts	MA
+Michigan	MI
+Minnesota	MN
+Mississippi	MS
+Missouri	MO
+Montana	MT
+Nebraska	NE
+Nevada	NV
+New Hampshire	NH
+New Jersey	NJ
+New Mexico	NM
+New York	NY
+North Carolina	NC
+North Dakota	ND
+Ohio	OH
+Oklahoma	OK
+Oregon	OR
+Pennsylvania	PA
+Rhode Island	RI
+South Carolina	SC
+South Dakota	SD
+Tennessee	TN
+Tennessee	TENN
+Texas	TX
+Utah	UT
+Vermont	VT
+Virginia	VA
+Washington	WA
+West Virginia	WV
+Wisconsin	WI
+Wyoming	WY
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/date/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/date/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/date/day.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/date/day.tsv
@@ -0,0 +1,31 @@
+one
+two
+three
+four
+five
+six
+seven
+eight
+nine
+ten
+eleven
+twelve
+thirteen
+fourteen
+fifteen
+sixteen
+seventeen
+eighteen
+nineteen
+twenty
+twenty one
+twenty two
+twenty three
+twenty four
+twenty five
+twenty six
+twenty seven
+twenty eight
+twenty nine
+thirty
+thirty one
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/date/month_abbr.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/date/month_abbr.tsv
@@ -0,0 +1,12 @@
+jan	january
+feb	february
+mar	march
+apr	april
+jun	june
+jul	july
+aug	august
+sep	september
+sept	september
+oct	october
+nov	november
+dec	december
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/date/month_name.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/date/month_name.tsv
@@ -0,0 +1,12 @@
+january
+february
+march
+april
+may
+june
+july
+august
+september
+october
+november
+december
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/date/month_number.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/date/month_number.tsv
@@ -0,0 +1,24 @@
+1	january
+2	february
+3	march
+4	april
+5	may
+6	june
+7	july
+8	august
+9	september
+10	october
+11	november
+12	december
+01	january
+02	february
+03	march
+04	april
+05	may
+06	june
+07	july
+08	august
+09	september
+10	october
+11	november
+12	december
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/date/year_suffix.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/date/year_suffix.tsv
@@ -0,0 +1,16 @@
+A. D	AD
+A.D	AD
+a. d	AD
+a.d	AD
+a. d.	AD
+a.d.	AD
+B. C	BC
+B.C	BC
+b. c	BC
+b.c	BC
+A. D.	AD
+A.D.	AD
+B. C.	BC
+B.C.	BC
+b. c.	BC
+b.c.	BC
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/electronic/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/electronic/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/electronic/domain.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/electronic/domain.tsv
@@ -0,0 +1,12 @@
+.com	dot com
+.org	dot org
+.gov	dot gov
+.uk	dot UK
+.fr	dot FR
+.net	dot net
+.br	dot BR
+.in	dot IN
+.ru	dot RU
+.de	dot DE
+.it	dot IT
+.jpg	dot jpeg
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/electronic/symbol.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/electronic/symbol.tsv
@@ -0,0 +1,21 @@
+.	dot
+-	dash
+_	underscore
+!	exclamation mark
+#	number sign
+$	dollar sign
+%	percent sign
+&	ampersand
+'	quote
+*	asterisk
+	plus
+/	slash
+=	equal sign
+?	question mark
+^	circumflex
+`	right single quote
+{	left brace
+|	vertical bar
+}	right brace
+~	tilde
+,	comma
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/measure/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/measure/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/measure/math_operation.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/measure/math_operation.tsv
@@ -0,0 +1,8 @@
+	plus
+-	minus
+/	divided
+÷	divided
+:	divided
+×	times
+*	times
+·	times
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/measure/unit.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/measure/unit.tsv
@@ -0,0 +1,127 @@
+amu	atomic mass unit
+bar	bar
+°	degree
+º	degree
+°c	degree Celsius
+°C	degree Celsius
+ºc	degree Celsius
+ºC	degree Celsius
+℃	degree Celsius
+cm2	square centimeter
+cm²	square centimeter
+cm3	cubic centimeter
+cm³	cubic centimeter
+cm	centimeter
+cwt	hundredweight
+db	decibel
+dm3	cubic decimeter
+dm³	cubic decimeter
+dm	decimeter
+ds	decisecond
+°f	degree Fahrenheit
+°F	degree Fahrenheit
+℉	degree Fahrenheit
+ft	foot
+ghz	gigahertz
+gw	gigawatt
+gwh	gigawatt hour
+hz	hertz
+"	inch
+kbps	kilobit per second
+kcal	kilo calory
+kgf	kilogram force
+kg	kilogram
+khz	kilohertz
+km2	square kilometer
+km²	square kilometer
+km3	cubic kilometer
+km³	cubic kilometer
+km	kilometer
+kpa	kilopascal
+kwh	kilowatt hour
+kw	kilowatt
+kW	kilowatt
+lb	pound
+lbs	pound
+m2	square meter
+m²	square meter
+m3	cubic meter
+m³	cubic meter
+mbps	megabit per second
+mg	milligram
+mhz	megahertz
+mi2	square mile
+mi²	square mile
+mi3	cubic mile
+mi³	cubic mile
+cu mi	cubic mile
+mi	mile
+min	minute
+ml	milliliter
+mm2	square millimeter
+mm²	square millimeter
+mol	mole
+mpa	megapascal
+mph	mile per hour
+ng	nanogram
+nm	nanometer
+ns	nanosecond
+oz	ounce
+pa	pascal
+%	percent
+rad	radian
+rpm	revolution per minute
+sq ft	square foot
+sq mi	square mile
+sv	sievert
+tb	terabyte
+tj	terajoule
+tl	teraliter
+v	volt
+yd	yard
+μg	microgram
+μm	micrometer
+μs	microsecond
+ω	ohm
+atm	ATM
+au	AU
+bq	BQ
+cc	CC
+cd	CD
+da	DA
+eb	EB
+ev	EV
+f	F
+gb	GB
+g	G
+gl	GL
+gpa	GPA
+gy	GY
+ha	HA
+h	H
+hl	HL
+hp	GP
+hs	HS
+kb	KB
+kl	KL
+kn	KN
+kt	KT
+kv	KV
+lm	LM
+ma	MA
+mA	MA
+mb	MB
+mc	MC
+mf	MF
+m	M
+mm	MM
+ms	MS
+mv	MV
+mw	MW
+pb	PB
+pg	PG
+ps	PS
+s	S
+tb	TB
+tb	YB
+zb	ZB
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/measure/unit_alternatives.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/measure/unit_alternatives.tsv
@@ -0,0 +1,43 @@
+atm	atmosphere
+bq	becquerel
+cd	candela
+da	dalton
+eb	exabyte
+f	degree Fahrenheit
+gb	gigabyte
+g	gram
+gl	gigaliter
+ha	hectare
+h	hour
+hl	hectoliter
+hp	horsepower
+hp	horsepower
+kb	kilobit
+kb	kilobyte
+ma	megaampere
+mA	megaampere
+ma	milliampere
+mA	milliampere
+mb	megabyte
+mc	megacoulomb
+mf	megafarad
+m	meter
+m	minute
+mm	millimeter
+mm	millimeter
+mm	millimeter
+ms	megasecond
+ms	mega siemens
+ms	millisecond
+mv	millivolt
+mV	millivolt
+mw	megawatt
+mW	megawatt
+pb	petabyte
+pg	petagram
+ps	petasecond
+s	second
+tb	terabyte
+tb	terabyte
+yb	yottabyte
+zb	zettabyte
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/money/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/money/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/money/currency_major.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/money/currency_major.tsv
@@ -0,0 +1,39 @@
+$	dollar
+$	us dollar
+US$	us dollar
+฿	Thai Baht
+£	pound
+€	euro
+₩	won
+nzd	new zealand dollar
+rs	rupee
+chf	swiss franc
+dkk	danish kroner
+fim	finnish markka
+aed	arab emirates dirham
+¥	yen
+czk	czech koruna
+mro	mauritanian ouguiya
+pkr	pakistani rupee
+crc	costa rican colon
+hk$	hong kong dollar
+npr	nepalese rupee
+awg	aruban florin
+nok	norwegian kroner
+tzs	tanzanian shilling
+sek	swedish kronor
+cyp	cypriot pound
+r	real
+sar	saudi riyal
+cve	cape verde escudo
+rsd	serbian dinar
+dm	german mark
+shp	saint helena pounds
+php	philippine peso
+cad	canadian dollar
+ssp	south sudanese pound
+scr	seychelles rupee
+mvr	maldivian rufiyaa
+DH	dirham
+Dh	dirham
+Dhs.	dirham
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/money/currency_minor_plural.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/money/currency_minor_plural.tsv
@@ -0,0 +1,4 @@
+$	cents
+US$	cents
+€	cents
+£	pence
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/money/currency_minor_singular.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/money/currency_minor_singular.tsv
@@ -0,0 +1,3 @@
+$	cent
+€	cent
+£	penny
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/money/per_unit.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/money/per_unit.tsv
@@ -0,0 +1,2 @@
+/ea	each
+/dozen
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/number/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/number/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/number/cardinal_number_name.far
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/number/cardinal_number_name.far
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/number/digit.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/number/digit.tsv
@@ -0,0 +1,9 @@
+one	1
+two	2
+three	3
+four	4
+five	5
+six	6
+seven	7
+eight	8
+nine	9
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/number/fraction.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/number/fraction.tsv
@@ -0,0 +1,18 @@
+¼	1/4
+½	1/2
+¾	3/4
+⅐	1/7
+⅑	1/9
+⅒	1/10
+⅓	1/3
+⅔	2/3
+⅕	1/5
+⅖	2/5
+⅗	3/5
+⅘	4/5
+⅙	1/6
+⅚	5/6
+⅛	1/8
+⅜	3/8
+⅝	5/8
+⅞	7/8
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/number/hundred.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/number/hundred.tsv
@@ -0,0 +1 @@
+hundred
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/number/quantity_abbr.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/number/quantity_abbr.tsv
@@ -0,0 +1,10 @@
+M	million
+MLN	million
+m	million
+mln	million
+B	billion
+b	billion
+BN	billion
+bn	billion
+K	thousand
+k	thousand
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/number/teen.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/number/teen.tsv
@@ -0,0 +1,10 @@
+ten	10
+eleven	11
+twelve	12
+thirteen	13
+fourteen	14
+fifteen	15
+sixteen	16
+seventeen	17
+eighteen	18
+nineteen	19
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/number/thousand.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/number/thousand.tsv
@@ -0,0 +1,22 @@
+thousand
+million
+billion
+trillion
+quadrillion
+quintillion
+sextillion
+septillion
+octillion
+nonillion
+decillion
+undecillion
+duodecillion
+tredecillion
+quattuordecillion
+quindecillion
+sexdecillion
+septendecillion
+octodecillion
+novemdecillion
+vigintillion
+centillion
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/number/ty.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/number/ty.tsv
@@ -0,0 +1,8 @@
+twenty	2
+thirty	3
+forty	4
+fifty	5
+sixty	6
+seventy	7
+eighty	8
+ninety	9
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/number/zero.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/number/zero.tsv
@@ -0,0 +1 @@
+zero	0
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/ordinal/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/ordinal/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/ordinal/digit.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/ordinal/digit.tsv
@@ -0,0 +1,9 @@
+first	one
+second	two
+third	three
+fourth	four
+fifth	five
+sixth	sixth
+seventh	seven
+eighth	eight
+ninth	nine
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/ordinal/teen.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/ordinal/teen.tsv
@@ -0,0 +1 @@
+twelfth	twelve
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/README.md
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/README.md
@@ -0,0 +1,20 @@
+`female.tsv` - List of common female names. Copyright (c) January 1991 by Mark Kantrowitz, 4987 names, Version 1.3 (29-MAR-94)
+Source: [https://www.cs.cmu.edu/Groups/AI/areas/nlp/corpora/names/female.txt](https://www.cs.cmu.edu/Groups/AI/areas/nlp/corpora/names/female.txt)
+
+`male.tsv` - List of common male names. Copyright (c) January 1991 by Mark Kantrowitz, 2940 names, Version 1.3 (29-MAR-94)
+Source: [https://www.cs.cmu.edu/Groups/AI/areas/nlp/corpora/names/male.txt](https://www.cs.cmu.edu/Groups/AI/areas/nlp/corpora/names/male.txt)
+
+[Corpora Readme.txt](https://www.cs.cmu.edu/Groups/AI/areas/nlp/corpora/names/readme.txt):
+
+You may use the lists of names for any purpose, so long as credit is given
+in any published work. You may also redistribute the list if you
+provide the recipients with a copy of this README file. The lists are
+not in the public domain (I retain the copyright on the lists) but are
+freely redistributable.
+
+If you have any additions to the lists of names, I would appreciate
+receiving them.
+
+My email address is mkant+@cs.cmu.edu.
+
+Mark Kantrowitz
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/female.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/female.tsv
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/key_word.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/key_word.tsv
@@ -0,0 +1,6 @@
+chapter
+class
+part
+article
+section
+paragraph
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/male.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/male.tsv
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/roman_to_spoken.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/roman/roman_to_spoken.tsv
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/suppletive.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/suppletive.tsv
@@ -0,0 +1,83 @@
+deer
+fish
+sheep
+foot	feet
+goose	geese
+man	men
+mouse	mice
+tooth	teeth
+woman	women
+won
+child	children
+ox	oxen
+wife	wives
+wolf	wolves
+analysis	analyses
+criterion	criteria
+lbs
+focus	foci
+percent
+hertz
+kroner	krone
+inch	inches
+calory	calories
+yen
+megahertz
+gigahertz
+kilohertz
+hertz
+CC
+c c
+horsepower
+hundredweight
+kilogram force	kilograms force
+mega siemens
+revolution per minute	revolutions per minute
+mile per hour	miles per hour
+megabit per second	megabits per second
+square foot	square feet
+kilobit per second	kilobits per second
+degree Celsius	degrees Celsius
+degree Fahrenheit	degrees Fahrenheit
+ATM
+AU
+BQ
+CC
+CD
+DA
+EB
+EV
+F
+GB
+G
+GL
+GPA
+GY
+HA
+H
+HL
+GP
+HS
+KB
+KL
+KN
+KT
+KV
+LM
+MA
+MA
+MB
+MC
+MF
+M
+MM
+MS
+MV
+MW
+PB
+PG
+PS
+S
+TB
+YB
+ZB
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/telephone/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/telephone/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/telephone/ip_prompt.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/telephone/ip_prompt.tsv
@@ -0,0 +1,2 @@
+IP address is
+IP is
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/telephone/ssn_prompt.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/telephone/ssn_prompt.tsv
@@ -0,0 +1,4 @@
+ssn is	SSN is
+ssn is	SSN is
+SSN is
+SSN
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/telephone/telephone_prompt.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/telephone/telephone_prompt.tsv
@@ -0,0 +1,5 @@
+call me at
+reach at
+reached at
+my number is
+hit me up at
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/time/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/time/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/time/suffix.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/time/suffix.tsv
@@ -0,0 +1,12 @@
+p.m.	PM
+p.m	PM
+pm	PM
+P.M.	PM
+P.M	PM
+PM	PM
+a.m.	AM
+a.m	AM
+am	AM
+A.M.	AM
+A.M	AM
+AM	AM
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/time/zone.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/time/zone.tsv
@@ -0,0 +1,14 @@
+cst	CST
+c.s.t	CST
+cet	CET
+c.e.t	CET
+pst	PST
+p.s.t	PST
+est	EST
+e.s.t	EST
+pt	PT
+p.t	PT
+et	ET
+e.t	ET
+gmt	GMT
+g.m.t	GMT
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/UK_to_US.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/UK_to_US.tsv
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/init.py
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/init.py
@@ -0,0 +1,13 @@
+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/alternatives.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/alternatives.tsv
@@ -0,0 +1,45 @@
+Hon.	Honorable
+Mr.	Mister
+Mrs.	Misses
+Ms.	Miss
+Mr	Mister
+Mrs	Misses
+Ms	Miss
+AC	air conditioning
+AC	air conditioner
+AC	air conditioners
+AC	alternating current
+&Co.	and Co.
+&Co.	and Company
+Mon	Monday
+Tu	Tuesday
+Wed	Wednesday
+Th	Thursday
+Thur	Thursday
+Thurs	Thursday
+Fri	Friday
+Sat	Saturday
+Sun	Sunday
+Mon	Mon
+Tu	Tu
+Wed	Wed
+Th	Th
+Thur	Thur
+Thurs	Thurs
+Fri	Fri
+Sat	Sat
+Sun	Sun
+=	equals
+#	number
+No.	number
+No	number
+NO	number
+NO.	number
+NO	nitrogen monoxide
+NO	NO
+NO.	NO.
+No.	No.
+No	No
+VOL	Volume
+VOL.	Volume
+TV	Television
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/alternatives_all_format.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/alternatives_all_format.tsv
@@ -0,0 +1,14 @@
+st	street
+st	saint
+dr	doctor
+dr	drive
+mt	mount
+sr	senior
+prof	professor
+mt	mountain
+sr	senior
+jr	junior
+vol	volume
+rd	road
+ave	avenue
+approx	approximately
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/asr.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/asr.tsv
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/ipa_symbols.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/ipa_symbols.tsv
@@ -0,0 +1,521 @@
+a
+aoj
+aəj
+aː
+aːʲ
+aː͡j
+aː͡ɨ̯
+aˤ
+aˤː
+a̠
+a̠ː
+a̰
+a͡e
+a͡i
+a͡iː
+a͡i̯
+a͡j
+a͡o
+a͡u
+a͡uː
+a͡u̯
+a͡w
+a͡ə
+a͡ɨ̯
+a͡ɪ
+a͡ʊ
+b
+bʱ
+bʲ
+bː
+b̥
+c
+cʰ
+cː
+ç
+d
+dʲ
+dː
+d̥
+d̪
+d̪ʱ
+d͡z
+d͡zʷ
+d͡zː
+d͡ʑ
+d͡ʒ
+d͡ʒʱ
+d͡ʒʲ
+d͡ʒː
+e
+eː
+eːʲ
+eː͡j
+ẽː
+ẽ͡j̃
+e̞
+e̞ː
+e̯
+e͡i
+e͡iː
+e͡ɨ̯
+f
+fʲ
+fː
+h
+hː
+i
+iəj
+iəw
+iʲ
+iː
+iːʲ
+ĩː
+i̥
+i̯
+i͡u
+i͡ə
+i͡ɛ
+j
+jː
+j̃
+k
+kʰ
+kʰː
+kʲ
+kʲʼ
+kʷ
+kʷʼ
+kʼ
+kː
+k̚
+k̚ʲ
+k̟̚
+k͈
+k͡p̚
+l
+lʲ
+lː
+l̥
+l̩
+m
+mʲ
+mʲː
+mː
+m̥
+m̩
+n
+nʲ
+nː
+n̥
+n̩
+o
+oʲ
+oː
+oːʲ
+ò
+õ͡j̃
+õ͡w̃
+o̝
+o̞
+o̞ː
+o̯
+o̰
+o͡u
+o͡uː
+p
+pʰ
+pʰː
+pʲ
+pʷʼ
+pʼ
+pː
+p̚
+p̚ʲ
+p͈
+p͜f
+p͡f
+q
+qʷ
+qʼ
+r
+rʲ
+rː
+r̂
+r̂ː
+r̥
+r̩
+s
+sʰ
+sʲ
+sʼ
+sː
+s͈
+t
+tʰ
+tʰː
+tʲ
+tʷʼ
+tʼ
+tː
+t̚
+t̪
+t̪ʰ
+t͈
+t͜s
+t͡s
+t͡sʰ
+t͡sʰː
+t͡sʲ
+t͡sʷ
+t͡sʼ
+t͡sː
+t͡ɕ
+t͡ɕʰ
+t͡ɕ͈
+t͡ʂ
+t͡ʂʼ
+t͡ʃ
+t͡ʃʰ
+t͡ʃʰː
+t͡ʃʲ
+t͡ʃʷ
+t͡ʃʼ
+t͡ʃː
+u
+uəj
+uʲ
+uː
+uːʲ
+ũː
+ũ͡j̃
+u̯
+u͡e
+u͡i
+u͡j
+u͡ɔ
+u͡ə
+v
+vʲ
+vː
+w
+w̃
+x
+xʷ
+xː
+y
+yː
+yːʲ
+y̯
+z
+zʲ
+zː
+z̥
+à
+àː
+á
+áː
+â
+âː
+ã
+ã̠
+æ
+æː
+æ̀
+æ̀ː
+æ̂
+æ̂ː
+æ͡ɪ
+æ͡ʉ
+ç
+è
+èː
+é
+éː
+ê
+êː
+ì
+ìː
+í
+íː
+î
+îː
+ï
+ð
+ò
+òː
+ó
+óː
+ô
+ôː
+õ
+õː
+õ̞
+ø
+øː
+øːʲ
+ø̯
+ù
+ùː
+ú
+úː
+û
+ûː
+ā
+āː
+ē
+ēː
+ĕ
+ĕ͡ə
+ě
+ěː
+ħ
+ĩ
+ĩː
+ī
+īː
+ŋ
+ŋʲ
+ŋ̊
+ŋ̍
+ŋ̟
+ŋ̩
+ŋ͡m
+ō
+ŏ
+ŏ͡ə
+œ
+œː
+œ̃
+œ͡i
+œ͡iː
+œ͡ʏ
+ř
+řː
+ũ
+ũː
+ū
+ūː
+ŭ
+ŭ͡ə
+ǎ
+ǎː
+ǐ
+ǐː
+ǒ
+ǒː
+ǔ
+ǔː
+ǣ
+ǣː
+ɐ
+ɐː
+ɐ̃
+ɐ̃͡j̃
+ɐ̃͡w̃
+ɐ̯
+ɐ̯̯
+ɑ
+ɑː
+ɑ̃
+ɑ̃ː
+ɒ
+ɒʲ
+ɒː
+ɓ
+ɔ
+ɔː
+ɔˤː
+ɔ̀
+ɔ̀ː
+ɔ́
+ɔ́ː
+ɔ̃
+ɔ̃ː
+ɔ̰
+ɔ͡i̯
+ɔ͡ə
+ɔ͡ɨ̯
+ɔ͡ɪ
+ɔ͡ʊ
+ɕ
+ɕʰ
+ɕː
+ɕ͈
+ɖ
+ɖʱ
+ɗ
+ɘ
+ɘː
+ə
+əː
+əˤ
+ə̀
+ə́
+ə̃
+ə̯
+ə͡u̯
+ə͡w
+ə͡ɨ
+ə͡ɨ̯
+ɚ
+ɛ
+ɛʲ
+ɛː
+ɛˤː
+ɛ̀
+ɛ̀ː
+ɛ́
+ɛ́ː
+ɛ̂
+ɛ̂ː
+ɛ̃
+ɛ̃ː
+ɛ̄
+ɛ̄ː
+ɛ̰
+ɛ͡i
+ɛ͡i̯
+ɛ͡u
+ɛ͡u̯
+ɛ͡ɪ
+ɛ͡ʊ
+ɜ
+ɜː
+ɝ
+ɝː
+ɟ
+ɟː
+ɟ͡ʝ
+ɡ
+ɡʱ
+ɡʲ
+ɡʷ
+ɡː
+ɡ̊
+ɣ
+ɤ
+ɥ
+ɦ
+ɨ
+ɨəj
+ɨː
+ɨ̃ᵝ
+ɨ̞
+ɨ̥ᵝ
+ɨ̯
+ɨ͡u̯
+ɨ͡w
+ɨ͡ə
+ɨᵝ
+ɨᵝː
+ɪ
+ɪː
+ɪ̀
+ɪ́
+ɪ̃
+ɪ̯
+ɪ̰
+ɪ͡u̯
+ɪ͡ʊ
+ɫ
+ɫː
+ɬ
+ɬʼ
+ɭ
+ɮ
+ɯ
+ɯː
+ɯ̟̃ᵝ
+ɯ̟̊ᵝ
+ɯ̟ᵝ
+ɯ̟ᵝː
+ɰ
+ɰ̃
+ɰᵝ
+ɱ
+ɱ̩
+ɲ
+ɲː
+ɲ̊
+ɲ̟
+ɳ
+ɴ
+ɸ
+ɸʷ
+ɹ
+ɻ
+ɽ
+ɽʱ
+ɾ
+ɾʲ
+ɾː
+ɾ̝̊
+ʀ
+ʁ
+ʁʷ
+ʁː
+ʂ
+ʂʷ
+ʃ
+ʃʰ
+ʃʲ
+ʃʷ
+ʃʷʼ
+ʃʼ
+ʃː
+ʈ
+ʈʰ
+ʉ
+ʉː
+ʊ
+ʊ̀
+ʊ́
+ʊ̃
+ʊ̯
+ʊ̯͡i
+ʊ̯͡ɨ
+ʊ̰
+ʋ
+ʌ
+ʌ̹
+ʍ
+ʎ
+ʏ
+ʏː
+ʏ̯
+ʐ
+ʐʷ
+ʑ
+ʒ
+ʒʲ
+ʒʷ
+ʒː
+ʔ
+ʔʲ
+ʔʷ
+ʝ
+˦ˀ˥
+˦˥
+˦˧˥
+˦˩
+˧ˀ˨
+˧˦
+˧˧
+˧˨
+˧˩
+˨˩
+˨˩˦
+˨˩˨
+β
+θ
+χ
+χʷ
+χː
+ḛ
+ḭ
+ṵ
+ẽ
+ẽː
+ẽ̞
+‿
--- a/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv
+++ b/utils/speechio/nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv
@@ -0,0 +1,21 @@
+Mr.	mister
+Mrs.	misses
+Dr.	doctor
+Drs.	doctors
+Co.	company
+Lt.	lieutenant
+Sgt.	sergeant
+St.	saint
+Jr.	junior
+Maj.	major
+Hon.	honorable
+Gov.	governor
+Capt.	captain
+Esq.	esquire
+Gen.	general
+Ltd.	limited
+Rev.	reverend
+Col.	colonel
+Mt.	mount
+Ft.	fort
+etc.	et cetera
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
zhousha	ab292e63f6	update Readme	2025-08-29 15:31:01 +08:00
zhousha	70f63772f7	update Readme	2025-08-29 13:38:28 +08:00
zhousha	99ffbdb4d5	update Readme	2025-08-29 11:27:43 +08:00
zhousha	90b80e3bcb	update Readme	2025-08-28 10:28:10 +08:00
zhousha	32ad8fb98f	update Dockerfile	2025-08-22 18:54:44 +08:00
zhousha	3c69575c72	Merge branch 'main' of http://36.103.238.188:980/EngineX-Iluvatar/enginex-bi_series-vc-cnn into feat/v0	2025-08-22 18:10:11 +08:00
zhousha	1b78ebefdd	update code	2025-08-22 18:00:46 +08:00
lumian	56f0b5b81d	更新 README.md	2025-08-21 14:16:09 +08:00
zhousha	a575a38552	update code for readme	2025-08-15 10:51:29 +08:00
zhousha	8bc7005d63	update readme	2025-08-12 18:32:14 +08:00
zhousha	9d18371bb7	push to main	2025-08-12 18:19:12 +08:00
zhousha	44954b7481	update	2025-08-06 15:45:17 +08:00
zhousha	55a67e817e	update	2025-08-06 15:38:55 +08:00
zhousha	4916ad0fe0	update	2025-08-06 15:32:57 +08:00
				`@@ -0,0 +1 @@`
				`nemo_version from commit:eae1684f7f33c2a18de9ecfa42ec7db93d39e631`