Upgrade to vllm 0.17.0 corex v4.1 overlay
This commit is contained in:
69
README.md
69
README.md
@@ -1,62 +1,37 @@
|
||||
# bi_150-vllm
|
||||
|
||||
基于 `registry.iluvatar.com.cn:10443/customer/sz/vllm0.11.2-4.4.0-x86:v8` 的
|
||||
`vLLM 0.16.1rc0` 构建仓库,用于在 BI-V150 虚拟机环境中生成可直接运行的镜像。
|
||||
This repository contains the extracted `vLLM 0.17.0+corex.20260420090923`
|
||||
Python package used to overlay the vendor image
|
||||
`registry.iluvatar.com.cn:10443/customer/sz/vllm0.17.0-4.4.0-x86:v4.1`.
|
||||
|
||||
## 改动说明
|
||||
|
||||
本仓库只保留构建镜像所需的最小内容:
|
||||
## Included files
|
||||
|
||||
- `vllm/`
|
||||
当前运行代码
|
||||
- `vllm-0.16.1rc0+corex.4.4.0.dist-info/`
|
||||
对应的包元数据
|
||||
The Python package code copied from the image payload.
|
||||
- `vllm-0.17.0+corex.20260420090923.dist-info/`
|
||||
The package metadata extracted from the image.
|
||||
- `Dockerfile`
|
||||
构建最终镜像
|
||||
Builds a new image by replacing the installed `vllm` package in the vendor base image.
|
||||
|
||||
与基础镜像相比,本仓库保留的关键代码改动如下:
|
||||
## Build
|
||||
|
||||
- 在 `vllm/platforms/__init__.py` 中修复 CUDA 平台识别逻辑
|
||||
- 当 NVML 不可用且出现 `NVML Shared Library Not Found` 一类错误时
|
||||
不再直接判定为非 CUDA 平台
|
||||
- 改为回退到 `torch.cuda.is_available()` 和
|
||||
`torch.cuda.device_count()` 继续判断 CUDA 是否可用
|
||||
- 调整 CLI 初始化逻辑,避免 benchmark 可选依赖缺失时阻塞
|
||||
`vllm serve ...` 启动
|
||||
|
||||
这个修复用于解决如下启动失败:
|
||||
|
||||
```text
|
||||
RuntimeError: Failed to infer device type
|
||||
```
|
||||
|
||||
## 构建镜像
|
||||
|
||||
在仓库根目录执行:
|
||||
Run the following command from the repository root:
|
||||
|
||||
```bash
|
||||
docker build -t bi_150_vllm:0.16.1 .
|
||||
docker build --pull=false \
|
||||
--build-arg BASE_IMAGE=registry.iluvatar.com.cn:10443/customer/sz/vllm0.17.0-4.4.0-x86:v4.1 \
|
||||
-t bi_150_vllm:0.17.0 \
|
||||
.
|
||||
```
|
||||
|
||||
## 启动镜像
|
||||
## Verify
|
||||
|
||||
```bash
|
||||
docker run -dit \
|
||||
--name iluvatar_test \
|
||||
-p 38047:8000 \
|
||||
--privileged \
|
||||
-v /lib/modules:/lib/modules \
|
||||
-v /dev:/dev \
|
||||
-v /usr/src:/usr/src \
|
||||
-v /mnt/gpfs/leaderboard/modelHubXC/Amu/t1-1.5B:/model \
|
||||
-e CUDA_VISIBLE_DEVICES=0 \
|
||||
--entrypoint vllm \
|
||||
bi_150_vllm:0.16.1 \
|
||||
serve /model \
|
||||
--port 8000 \
|
||||
--served-model-name llm \
|
||||
--max-model-len 2048 \
|
||||
--enforce-eager \
|
||||
--trust-remote-code \
|
||||
-tp 1
|
||||
docker run --rm -it bi_150_vllm:0.17.0 \
|
||||
python3 -c "import vllm; print(vllm.__file__); print(vllm.__version__)"
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- This is an overlay-style repository, not the original upstream git source tree.
|
||||
- The Docker image keeps the vendor runtime stack and only replaces the Python package files.
|
||||
|
||||
Reference in New Issue
Block a user