[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
bde38c11df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
This commit is contained in:
SILONG ZENG
2026-01-15 09:06:01 +08:00
committed by GitHub
parent 96edd4673f
commit 4811ba62e0
75 changed files with 711 additions and 308 deletions

View File

@@ -8,12 +8,12 @@ Multi-node inference is suitable for scenarios where the model cannot be deploye
## Verify Multi-Node Communication Environment
### Physical Layer Requirements:
### Physical Layer Requirements
* The physical machines must be located on the same LAN, with network connectivity.
* All NPUs are connected with optical modules, and the connection status must be normal.
### Verification Process:
### Verification Process
Execute the following commands on each node in sequence. The results must all be `success` and the status must be `UP`:
@@ -32,7 +32,8 @@ Execute the following commands on each node in sequence. The results must all be
cat /etc/hccn.conf
```
### NPU Interconnect Verification:
### NPU Interconnect Verification
#### 1. Get NPU IP Addresses
```bash
@@ -47,7 +48,9 @@ hccn_tool -i 0 -ping -g address 10.20.0.20
```
## Set Up and Start the Ray Cluster
### Setting Up the Basic Container
To ensure a consistent execution environment across all nodes, including the model path and Python environment, it is advised to use Docker images.
For setting up a multi-node inference cluster with Ray, **containerized deployment** is the preferred approach. Containers should be started on both the primary and secondary nodes, with the `--net=host` option to enable proper network connectivity.
@@ -88,6 +91,7 @@ docker run --rm \
```
### Start Ray Cluster
After setting up the containers and installing vllm-ascend on each node, follow the steps below to start the Ray cluster and execute inference tasks.
Choose one machine as the primary node and the others as secondary nodes. Before proceeding, use `ip addr` to check your `nic_name` (network interface name).
@@ -133,9 +137,10 @@ Once the cluster is started on multiple nodes, execute `ray status` and `ray lis
After Ray is successfully started, the following content will appear:\
A local Ray instance has started successfully.\
Dashboard URL: The access address for the Ray Dashboard (default: http://localhost:8265); Node status (CPU/memory resources, number of healthy nodes); Cluster connection address (used for adding multiple nodes).
Dashboard URL: The access address for the Ray Dashboard (default: <http://localhost:8265>); Node status (CPU/memory resources, number of healthy nodes); Cluster connection address (used for adding multiple nodes).
## Start the Online Inference Service on Multi-node scenario
In the container, you can use vLLM as if all NPUs were on a single node. vLLM will utilize NPU resources across all nodes in the Ray cluster.
**You only need to run the vllm command on one node.**