update qwen2.5vl readme (#4938)
### What this PR does / why we need it?
fix qwen2.5vl readme, del gen ranktable and add install mooncake
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: liziyu <liziyu16@huawei.com>
This commit is contained in:
@@ -25,35 +25,94 @@ for i in {0..7}; do hccn_tool -i $i -net_health -g ; done
|
|||||||
for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done
|
for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done
|
||||||
# View gateway configuration
|
# View gateway configuration
|
||||||
for i in {0..7}; do hccn_tool -i $i -gateway -g ; done
|
for i in {0..7}; do hccn_tool -i $i -gateway -g ; done
|
||||||
# View NPU network configuration
|
```
|
||||||
|
|
||||||
|
2. Check NPU network configuration:
|
||||||
|
|
||||||
|
Ensure that the hccn.conf file exists in the environment. If using Docker, mount it into the container.
|
||||||
|
|
||||||
|
```bash
|
||||||
cat /etc/hccn.conf
|
cat /etc/hccn.conf
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Get NPU IP Addresses
|
3. Get NPU IP Addresses
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
for i in {0..7}; do hccn_tool -i $i -ip -g;done
|
for i in {0..7}; do hccn_tool -i $i -ip -g;done
|
||||||
```
|
```
|
||||||
|
|
||||||
## Generate Ranktable
|
## Run with Docker
|
||||||
|
Start a Docker container.
|
||||||
|
|
||||||
The rank table is a JSON file that specifies the mapping of Ascend NPU ranks to nodes. For more details, please refer to the [vllm-ascend examples](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/README.md). Execute the following commands for reference.
|
```{code-block} bash
|
||||||
|
:substitutions:
|
||||||
|
# Update the vllm-ascend image
|
||||||
|
export IMAGE=m.daocloud.io/quay.io/ascend/vllm-ascend:|vllm_ascend_version|
|
||||||
|
export NAME=vllm-ascend
|
||||||
|
|
||||||
```shell
|
# Run the container using the defined variables
|
||||||
cd vllm-ascend/examples/disaggregate_prefill_v1/
|
docker run --rm \
|
||||||
bash gen_ranktable.sh --ips 192.0.0.1 \
|
--name $NAME \
|
||||||
--npus-per-node 2 --network-card-name eth0 --prefill-device-cnt 1 --decode-device-cnt 1
|
--net=host \
|
||||||
|
--shm-size=1g \
|
||||||
|
--device /dev/davinci0 \
|
||||||
|
--device /dev/davinci1 \
|
||||||
|
--device /dev/davinci2 \
|
||||||
|
--device /dev/davinci3 \
|
||||||
|
--device /dev/davinci4 \
|
||||||
|
--device /dev/davinci5 \
|
||||||
|
--device /dev/davinci6 \
|
||||||
|
--device /dev/davinci7 \
|
||||||
|
--device /dev/davinci_manager \
|
||||||
|
--device /dev/devmm_svm \
|
||||||
|
--device /dev/hisi_hdc \
|
||||||
|
-v /usr/local/dcmi:/usr/local/dcmi \
|
||||||
|
-v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
|
||||||
|
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
|
||||||
|
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
|
||||||
|
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
|
||||||
|
-v /etc/ascend_install.info:/etc/ascend_install.info \
|
||||||
|
-v /etc/hccn.conf:/etc/hccn.conf \
|
||||||
|
-v /mnt/sfs_turbo/.cache:/root/.cache \
|
||||||
|
-it $IMAGE bash
|
||||||
```
|
```
|
||||||
|
|
||||||
If you want to run "2P1D", please set npus-per-node to 3 and prefill-device-cnt to 2. The rank table will be generated at /vllm-workspace/vllm-ascend/examples/disaggregate_prefill_v1/ranktable.json
|
## Install Mooncake
|
||||||
|
|
||||||
|Parameter | Meaning |
|
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. First, we need to obtain the Mooncake project. Refer to the following command:
|
||||||
| --- | --- |
|
|
||||||
| --ips | Each node's local IP address (prefiller nodes should be in front of decoder nodes) |
|
```shell
|
||||||
| --npus-per-node | Each node's NPU clips |
|
git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git
|
||||||
| --network-card-name | The physical machines' NIC |
|
```
|
||||||
|--prefill-device-cnt | NPU clips used for prefill |
|
|
||||||
|--decode-device-cnt |NPU clips used for decode |
|
(Optional) Replace go install url if the network is poor
|
||||||
|
|
||||||
|
```shell
|
||||||
|
cd Mooncake
|
||||||
|
sed -i 's|https://go.dev/dl/|https://golang.google.cn/dl/|g' dependencies.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Install mpi
|
||||||
|
|
||||||
|
```shell
|
||||||
|
apt-get install mpich libmpich-dev -y
|
||||||
|
```
|
||||||
|
|
||||||
|
Install the relevant dependencies. The installation of Go is not required.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
bash dependencies.sh -y
|
||||||
|
```
|
||||||
|
|
||||||
|
Compile and install
|
||||||
|
|
||||||
|
```shell
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
cmake .. -DUSE_ASCEND_DIRECT=ON
|
||||||
|
make -j
|
||||||
|
make install
|
||||||
|
```
|
||||||
|
|
||||||
## Prefiller/Decoder Deployment
|
## Prefiller/Decoder Deployment
|
||||||
|
|
||||||
@@ -75,8 +134,8 @@ export OMP_NUM_THREADS=10
|
|||||||
vllm serve /model/Qwen2.5-VL-7B-Instruct \
|
vllm serve /model/Qwen2.5-VL-7B-Instruct \
|
||||||
--host 0.0.0.0 \
|
--host 0.0.0.0 \
|
||||||
--port 13700 \
|
--port 13700 \
|
||||||
--tensor-parallel-size 1 \
|
|
||||||
--no-enable-prefix-caching \
|
--no-enable-prefix-caching \
|
||||||
|
--tensor-parallel-size 1 \
|
||||||
--seed 1024 \
|
--seed 1024 \
|
||||||
--served-model-name qwen25vl \
|
--served-model-name qwen25vl \
|
||||||
--max-model-len 40000 \
|
--max-model-len 40000 \
|
||||||
|
|||||||
Reference in New Issue
Block a user