update qwen2.5vl readme (#4938)

### What this PR does / why we need it? fix qwen2.5vl readme, del gen ranktable and add install mooncake - vLLM version: v0.12.0 - vLLM main: ad32e3e19c Signed-off-by: liziyu <liziyu16@huawei.com>
2025-12-12 15:40:07 +08:00
parent 4ae7588c52
commit 716c4dacfe
1 changed files with 77 additions and 18 deletions
--- a/docs/source/tutorials/pd_disaggregation_mooncake_single_node.md
+++ b/docs/source/tutorials/pd_disaggregation_mooncake_single_node.md
@@ -25,35 +25,94 @@ for i in {0..7}; do hccn_tool -i $i -net_health -g ; done
 for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done
 # View gateway configuration
 for i in {0..7}; do hccn_tool -i $i -gateway -g ; done
-# View NPU network configuration
+```
+
+2. Check NPU network configuration:
+
+Ensure that the hccn.conf file exists in the environment. If using Docker, mount it into the container.
+
+```bash
 cat /etc/hccn.conf
 ```

-2. Get NPU IP Addresses
+3. Get NPU IP Addresses

 ```bash
 for i in {0..7}; do hccn_tool -i $i -ip -g;done
 ```

-## Generate Ranktable
+## Run with Docker
+Start a Docker container.

-The rank table is a JSON file that specifies the mapping of Ascend NPU ranks to nodes. For more details, please refer to the [vllm-ascend examples](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/README.md). Execute the following commands for reference.
+```{code-block} bash
+   :substitutions:
+# Update the vllm-ascend image
+export IMAGE=m.daocloud.io/quay.io/ascend/vllm-ascend:|vllm_ascend_version|
+export NAME=vllm-ascend

-```shell
-cd vllm-ascend/examples/disaggregate_prefill_v1/
-bash gen_ranktable.sh --ips 192.0.0.1 \
-  --npus-per-node  2 --network-card-name eth0 --prefill-device-cnt 1 --decode-device-cnt 1
+# Run the container using the defined variables
+docker run --rm \
+--name $NAME \
+--net=host \
+--shm-size=1g \
+--device /dev/davinci0 \
+--device /dev/davinci1 \
+--device /dev/davinci2 \
+--device /dev/davinci3 \
+--device /dev/davinci4 \
+--device /dev/davinci5 \
+--device /dev/davinci6 \
+--device /dev/davinci7 \
+--device /dev/davinci_manager \
+--device /dev/devmm_svm \
+--device /dev/hisi_hdc \
+-v /usr/local/dcmi:/usr/local/dcmi \
+-v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
+-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
+-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
+-v /etc/ascend_install.info:/etc/ascend_install.info \
+-v /etc/hccn.conf:/etc/hccn.conf \
+-v /mnt/sfs_turbo/.cache:/root/.cache \
+-it $IMAGE bash
 ```

-If you want to run "2P1D", please set npus-per-node to 3 and prefill-device-cnt to 2. The rank table will be generated at /vllm-workspace/vllm-ascend/examples/disaggregate_prefill_v1/ranktable.json
+## Install Mooncake

-|Parameter  | Meaning |
-| --- | --- |
-| --ips | Each node's local IP address (prefiller nodes should be in front of decoder nodes) |
-| --npus-per-node | Each node's NPU clips |
-| --network-card-name | The physical machines' NIC |
-|--prefill-device-cnt  | NPU clips used for prefill |
-|--decode-device-cnt |NPU clips used for decode |
+Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. First, we need to obtain the Mooncake project. Refer to the following command:
+
+```shell
+git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git
+```
+
+(Optional) Replace go install url if the network is poor
+
+```shell
+cd Mooncake
+sed -i 's|https://go.dev/dl/|https://golang.google.cn/dl/|g' dependencies.sh
+```
+
+Install mpi
+
+```shell
+apt-get install mpich libmpich-dev -y
+```
+
+Install the relevant dependencies. The installation of Go is not required.
+
+```shell
+bash dependencies.sh -y
+```
+
+Compile and install
+
+```shell
+mkdir build
+cd build
+cmake .. -DUSE_ASCEND_DIRECT=ON
+make -j
+make install
+```

 ## Prefiller/Decoder Deployment

@@ -65,7 +124,7 @@ We can run the following scripts to launch a server on the prefiller/decoder NPU

 ```shell
 export ASCEND_RT_VISIBLE_DEVICES=0
-export HCCL_IF_IP=192.0.0.1 # node ip
+export HCCL_IF_IP=192.0.0.1  # node ip
 export GLOO_SOCKET_IFNAME="eth0"  # network card name
 export TP_SOCKET_IFNAME="eth0"
 export HCCL_SOCKET_IFNAME="eth0"
@@ -75,8 +134,8 @@ export OMP_NUM_THREADS=10
 vllm serve /model/Qwen2.5-VL-7B-Instruct  \
  --host 0.0.0.0 \
  --port 13700 \
-  --tensor-parallel-size 1 \
  --no-enable-prefix-caching \
+  --tensor-parallel-size 1 \
  --seed 1024 \
  --served-model-name qwen25vl \
  --max-model-len 40000  \