2025-08-06 10:30:31 +08:00
|
|
|
|
# kunlunxin
|
|
|
|
|
|
|
|
|
|
|
|
适配 kunlunxin 昆仑芯R200-8F加速卡的大模型推理服务镜像
|
|
|
|
|
|
|
|
|
|
|
|
## 启动
|
|
|
|
|
|
|
|
|
|
|
|
### 使用docker方式启动
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
docker run -it --rm \
|
|
|
|
|
|
--net=host \
|
|
|
|
|
|
-v /mnt/disk0/models/model-qwen1-5-72b-chat/:/model \
|
|
|
|
|
|
-e MODEL_NAME=qwen1.5-72b \
|
|
|
|
|
|
-e NUM_GPUs=4 \
|
|
|
|
|
|
-e WEIGHT_ONLY_PRECISION=int8 \
|
|
|
|
|
|
--device /dev/xpuctrl \
|
|
|
|
|
|
--device /dev/xpu0 \
|
|
|
|
|
|
--device /dev/xpu1 \
|
|
|
|
|
|
--device /dev/xpu2 \
|
|
|
|
|
|
--device /dev/xpu3 \
|
2025-09-19 14:54:18 +08:00
|
|
|
|
slx-infer-kunlunxin:release-0.1-pipe-1-commit-cd30b38d
|
2025-08-06 10:30:31 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 参数说明
|
|
|
|
|
|
|
|
|
|
|
|
#### 环境变量
|
|
|
|
|
|
|
|
|
|
|
|
- MODEL_PATH: 模型在容器中的路径,默认为 `/model`
|
|
|
|
|
|
|
|
|
|
|
|
- MODEL_NAME: 模型名字,用于api接口中
|
|
|
|
|
|
|
|
|
|
|
|
- PORT:端口,默认`80`
|
|
|
|
|
|
|
|
|
|
|
|
- BUILD_SCRIPT_ROOT:编译脚本目录,一般不需要修改
|
|
|
|
|
|
|
|
|
|
|
|
- WEIGHT_ONLY_PRECISION:量化权重的精度,`int8`或`int4`
|
|
|
|
|
|
|
|
|
|
|
|
- ENGINE_DIR:编译后的模型存储路径,默认`./xtrt_engine`
|
|
|
|
|
|
|
|
|
|
|
|
- BUILD_EXTRA:编译用到的额外参数
|
|
|
|
|
|
|
|
|
|
|
|
#### 参数
|
|
|
|
|
|
|
|
|
|
|
|
基本与vllm相同,可以使用--help查看。
|
|
|
|
|
|
|
|
|
|
|
|
由于后端的engine使用的是xtrt的engine,所以相关的参数无效或造成未知的结果,所以不建议修改相关参数。
|