# Multi-Node-DP (Qwen3-VL-235B-A22B) :::{note} Qwen3 VL rely on the newest version of `transformers`(>4.56.2). Please install it from source until it's released. ::: ## Verify Multi-Node Communication Environment referring to [multi_node.md](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node.html#verification-process) ## Run with docker Assume you have an Atlas 800 A3(64G*16) nodes(or 2 * A2), and want to deploy the `Qwen3-VL-235B-A22B-Instruct` model across multi-node. ```{code-block} bash :substitutions: # Update the vllm-ascend image export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version| docker run --rm \ --name vllm-ascend \ --net=host \ --device /dev/davinci0 \ --device /dev/davinci1 \ --device /dev/davinci2 \ --device /dev/davinci3 \ --device /dev/davinci4 \ --device /dev/davinci5 \ --device /dev/davinci6 \ --device /dev/davinci7 \ --device /dev/davinci8 \ --device /dev/davinci9 \ --device /dev/davinci10 \ --device /dev/davinci11 \ --device /dev/davinci12 \ --device /dev/davinci13 \ --device /dev/davinci14 \ --device /dev/davinci15 \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /root/.cache:/root/.cache \ -p 8000:8000 \ -it $IMAGE bash ``` Run the following scripts on two nodes respectively :::{note} Before launch the inference server, ensure the following environment variables are set for multi node communication ::: node0 ```shell #!/bin/sh # this obtained through ifconfig # nic_name is the network interface name corresponding to local_ip of the current node nic_name="xxxx" local_ip="xxxx" export HCCL_IF_IP=$local_ip export GLOO_SOCKET_IFNAME=$nic_name export TP_SOCKET_IFNAME=$nic_name export HCCL_SOCKET_IFNAME=$nic_name export OMP_PROC_BIND=false export OMP_NUM_THREADS=100 export VLLM_USE_V1=1 export HCCL_BUFFSIZE=1024 vllm serve Qwen/Qwen3-VL-235B-A22B-Instruct \ --host 0.0.0.0 \ --port 8000 \ --data-parallel-size 2 \ --api-server-count 2 \ --data-parallel-size-local 1 \ --data-parallel-address $local_ip \ --data-parallel-rpc-port 13389 \ --seed 1024 \ --served-model-name qwen3vl \ --tensor-parallel-size 8 \ --enable-expert-parallel \ --max-num-seqs 16 \ --max-model-len 32768 \ --max-num-batched-tokens 4096 \ --trust-remote-code \ --no-enable-prefix-caching \ --gpu-memory-utilization 0.8 \ ``` node1 ```shell #!/bin/sh # this obtained through ifconfig # nic_name is the network interface name corresponding to local_ip of the current node nic_name="xxxx" local_ip="xxxx" # The value of node0_ip must be consistent with the value of local_ip set in node0 (master node) node0_ip="xxxx" export HCCL_IF_IP=$local_ip export GLOO_SOCKET_IFNAME=$nic_name export TP_SOCKET_IFNAME=$nic_name export HCCL_SOCKET_IFNAME=$nic_name export OMP_PROC_BIND=false export OMP_NUM_THREADS=100 export VLLM_USE_V1=1 export HCCL_BUFFSIZE=1024 vllm serve Qwen/Qwen3-VL-235B-A22B-Instruct \ --host 0.0.0.0 \ --port 8000 \ --headless \ --data-parallel-size 2 \ --data-parallel-size-local 1 \ --data-parallel-start-rank 1 \ --data-parallel-address $node0_ip \ --data-parallel-rpc-port 13389 \ --seed 1024 \ --tensor-parallel-size 8 \ --served-model-name qwen3vl \ --max-num-seqs 16 \ --max-model-len 32768 \ --max-num-batched-tokens 4096 \ --enable-expert-parallel \ --trust-remote-code \ --no-enable-prefix-caching \ --gpu-memory-utilization 0.8 \ ``` If the service starts successfully, the following information will be displayed on node0: ```shell INFO: Started server process [44610] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Started server process [44611] INFO: Waiting for application startup. INFO: Application startup complete. ``` Once your server is started, you can query the model with input prompts: ```shell curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3vl", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}}, {"type": "text", "text": "What is the text in the illustrate?"} ]} ] }' ```