add pkgs

2025-08-06 15:49:14 +08:00
parent e80b916c52
commit bf00e72fb2
111 changed files with 21880 additions and 1 deletions
--- a/examples/gptj/README_CN.md
+++ b/examples/gptj/README_CN.md
@@ -0,0 +1,76 @@
+# GPT-J
+
+本文档介绍了如何使用昆仑芯XTRT-LLM在单XPU上构建和运行[GPT-J](https://huggingface.co/EleutherAI/gpt-j-6b)模型。
+
+## 概述
+
+XTRT-LLM GPT-J 示例代码位于 [`examples/gptj`](./)。 此文件夹中有以下几个主要文件：
+
+ * [`build.py`](./build.py) 构建运行GPT-J模型所需的XTRT引擎
+ * [`run.py`](./run.py)  基于输入的文字进行推理
+
+## 支持的矩阵
+
+  * FP16
+
+## 使用说明
+
+### 1.从HuggingFace（HF） Transformers下载权重
+
+```bash
+# 1. Weights & config
+git clone https://huggingface.co/EleutherAI/gpt-j-6b ./downloads/gptj-6b
+pushd ./downloads/gptj-6b && \
+  rm -f pytorch_model.bin && \
+  wget https://huggingface.co/EleutherAI/gpt-j-6b/resolve/main/pytorch_model.bin && \
+popd
+
+# 2. Vocab and merge table
+wget https://huggingface.co/EleutherAI/gpt-j-6b/resolve/main/vocab.json
+wget https://huggingface.co/EleutherAI/gpt-j-6b/resolve/main/merges.txt
+```
+
+### 2. 构建XTRT引擎
+
+XTRT-LLM从HF checkpoint构建XTRT引擎。如果未指定checkpoint目录，XTRT-LLM将使用伪权重构建引擎。
+
+构建调用示例：
+
+```bash
+# Build a float16 engine using HF weights.
+# Enable several XTRT-LLM plugins to increase runtime performance. It also helps with build time.
+
+python3 build.py --dtype=float16 \
+                 --log_level=verbose \
+                 --enable_context_fmha \
+                 --use_gpt_attention_plugin float16 \
+                 --use_gemm_plugin float16 \
+                 --max_batch_size=32 \
+                 --max_input_len=1919 \
+                 --max_output_len=128 \
+                 --output_dir=./downloads/gptj-6b/trt_engines/fp16/1-XPU/ \
+                 --model_dir=./downloads/gptj-6b 2>&1 | tee build.log
+
+# Build a float16 engine using dummy weights, useful for performance tests.
+# Enable several XTRT-LLM plugins to increase runtime performance. It also helps with build time.
+
+python3 build.py --dtype=float16 \
+                 --log_level=verbose \
+                 --enable_context_fmha \
+                 --use_gpt_attention_plugin float16 \
+                 --use_gemm_plugin float16 \
+                 --max_batch_size=32 \
+                 --max_input_len=1919 \
+                 --max_output_len=128 \
+                 --output_dir=./downloads/gptj-6b/trt_engines/gptj_engine_dummy_weights 2>&1 | tee build.log
+```
+
+### 3. 运行
+
+要运行XTRT-LLM GPT-J模型，请执行以下操作：
+
+```bash
+python3 run.py --max_output_len=50 \
+    --engine_dir=./downloads/gptj-6b/trt_engines/fp16/1-XPU/ \
+    --hf_model_location=./downloads/gptj-6b
+```