add pkgs
This commit is contained in:
76
examples/gptj/README_CN.md
Normal file
76
examples/gptj/README_CN.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# GPT-J
|
||||
|
||||
本文档介绍了如何使用昆仑芯XTRT-LLM在单XPU上构建和运行[GPT-J](https://huggingface.co/EleutherAI/gpt-j-6b)模型。
|
||||
|
||||
## 概述
|
||||
|
||||
XTRT-LLM GPT-J 示例代码位于 [`examples/gptj`](./)。 此文件夹中有以下几个主要文件:
|
||||
|
||||
* [`build.py`](./build.py) 构建运行GPT-J模型所需的XTRT引擎
|
||||
* [`run.py`](./run.py) 基于输入的文字进行推理
|
||||
|
||||
## 支持的矩阵
|
||||
|
||||
* FP16
|
||||
|
||||
## 使用说明
|
||||
|
||||
### 1.从HuggingFace(HF) Transformers下载权重
|
||||
|
||||
```bash
|
||||
# 1. Weights & config
|
||||
git clone https://huggingface.co/EleutherAI/gpt-j-6b ./downloads/gptj-6b
|
||||
pushd ./downloads/gptj-6b && \
|
||||
rm -f pytorch_model.bin && \
|
||||
wget https://huggingface.co/EleutherAI/gpt-j-6b/resolve/main/pytorch_model.bin && \
|
||||
popd
|
||||
|
||||
# 2. Vocab and merge table
|
||||
wget https://huggingface.co/EleutherAI/gpt-j-6b/resolve/main/vocab.json
|
||||
wget https://huggingface.co/EleutherAI/gpt-j-6b/resolve/main/merges.txt
|
||||
```
|
||||
|
||||
### 2. 构建XTRT引擎
|
||||
|
||||
XTRT-LLM从HF checkpoint构建XTRT引擎。如果未指定checkpoint目录,XTRT-LLM将使用伪权重构建引擎。
|
||||
|
||||
构建调用示例:
|
||||
|
||||
```bash
|
||||
# Build a float16 engine using HF weights.
|
||||
# Enable several XTRT-LLM plugins to increase runtime performance. It also helps with build time.
|
||||
|
||||
python3 build.py --dtype=float16 \
|
||||
--log_level=verbose \
|
||||
--enable_context_fmha \
|
||||
--use_gpt_attention_plugin float16 \
|
||||
--use_gemm_plugin float16 \
|
||||
--max_batch_size=32 \
|
||||
--max_input_len=1919 \
|
||||
--max_output_len=128 \
|
||||
--output_dir=./downloads/gptj-6b/trt_engines/fp16/1-XPU/ \
|
||||
--model_dir=./downloads/gptj-6b 2>&1 | tee build.log
|
||||
|
||||
# Build a float16 engine using dummy weights, useful for performance tests.
|
||||
# Enable several XTRT-LLM plugins to increase runtime performance. It also helps with build time.
|
||||
|
||||
python3 build.py --dtype=float16 \
|
||||
--log_level=verbose \
|
||||
--enable_context_fmha \
|
||||
--use_gpt_attention_plugin float16 \
|
||||
--use_gemm_plugin float16 \
|
||||
--max_batch_size=32 \
|
||||
--max_input_len=1919 \
|
||||
--max_output_len=128 \
|
||||
--output_dir=./downloads/gptj-6b/trt_engines/gptj_engine_dummy_weights 2>&1 | tee build.log
|
||||
```
|
||||
|
||||
### 3. 运行
|
||||
|
||||
要运行XTRT-LLM GPT-J模型,请执行以下操作:
|
||||
|
||||
```bash
|
||||
python3 run.py --max_output_len=50 \
|
||||
--engine_dir=./downloads/gptj-6b/trt_engines/fp16/1-XPU/ \
|
||||
--hf_model_location=./downloads/gptj-6b
|
||||
```
|
||||
Reference in New Issue
Block a user