Files
engnex-r_series-llm/README.md
2025-09-19 14:54:18 +08:00

46 lines
1.1 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# kunlunxin
适配 kunlunxin 昆仑芯R200-8F加速卡的大模型推理服务镜像
## 启动
### 使用docker方式启动
```bash
docker run -it --rm \
--net=host \
-v /mnt/disk0/models/model-qwen1-5-72b-chat/:/model \
-e MODEL_NAME=qwen1.5-72b \
-e NUM_GPUs=4 \
-e WEIGHT_ONLY_PRECISION=int8 \
--device /dev/xpuctrl \
--device /dev/xpu0 \
--device /dev/xpu1 \
--device /dev/xpu2 \
--device /dev/xpu3 \
    slx-infer-kunlunxin:release-0.1-pipe-1-commit-cd30b38d
```
### 参数说明
#### 环境变量
- MODEL_PATH: 模型在容器中的路径,默认为 `/model`
- MODEL_NAME: 模型名字用于api接口中
- PORT端口默认`80`
- BUILD_SCRIPT_ROOT编译脚本目录一般不需要修改
- WEIGHT_ONLY_PRECISION量化权重的精度`int8``int4`
- ENGINE_DIR编译后的模型存储路径默认`./xtrt_engine`
- BUILD_EXTRA编译用到的额外参数
#### 参数
基本与vllm相同可以使用--help查看。
由于后端的engine使用的是xtrt的engine所以相关的参数无效或造成未知的结果所以不建议修改相关参数。