[Doc][Misc] Comprehensive documentation cleanup and grammatical fixes (#8073)
What this PR does / why we need it? This pull request performs a comprehensive cleanup of the vLLM Ascend documentation. It fixes numerous typos, grammatical errors, and phrasing issues across community guidelines, developer documents, hardware tutorials, and feature guides. Key improvements include correcting hardware names (e.g., Atlas 300I), fixing broken links, cleaning up code examples (removing duplicate flags and trailing commas), and improving the clarity of technical explanations. These changes are necessary to ensure the documentation is professional, accurate, and easy for users to follow. Does this PR introduce any user-facing change? No, this PR contains documentation-only updates. How was this patch tested? The changes were manually reviewed for accuracy and grammatical correctness. No functional code changes were introduced. --------- Signed-off-by: herizhen <1270637059@qq.com> Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
This commit is contained in:
@@ -8,7 +8,7 @@ Context parallel feature currently is only supported on Atlas A3 device, and wil
|
||||
|
||||
vLLM-Ascend now supports long sequence with context parallel options. This guide takes one-by-one steps to verify these features with constrained resources.
|
||||
|
||||
Take the Deepseek-V3.1-w8a8 model as an example, use 3 Atlas 800T A3 servers to deploy the “1P1D” architecture. Node p is deployed across multiple machines, while node d is deployed on a single machine. Assume the IP of the prefiller server is 192.0.0.1 (prefill 1) and 192.0.0.2 (prefill 2), and the decoder servers are 192.0.0.3 (decoder 1). On each server, use 8 NPUs 16 chips to deploy one service instance.In the current example, we will enable the context parallel feature on node p to improve TTFT. Although enabling the DCP feature on node d can reduce memory usage, it would introduce additional communication and small operator overhead. Therefore, we will not enable the DCP feature on node d.
|
||||
Take the Deepseek-V3.1-w8a8 model as an example, use 3 Atlas 800T A3 servers to deploy the “1P1D” architecture. Node p is deployed across multiple machines, while node d is deployed on a single machine. Assume the IP of the prefiller server is 192.0.0.1 (prefill 1) and 192.0.0.2 (prefill 2), and the decoder servers are 192.0.0.3 (decoder 1). On each server, use 8 NPUs 16 chips to deploy one service instance. In the current example, we will enable the context parallel feature on node p to improve TTFT. Although enabling the DCP feature on node d can reduce memory usage, it would introduce additional communication and small operator overhead. Therefore, we will not enable the DCP feature on node d.
|
||||
|
||||
## Environment Preparation
|
||||
|
||||
@@ -69,223 +69,225 @@ We can run the following scripts to launch a server on the prefiller/decoder nod
|
||||
|
||||
1. Run the following script to execute online 128k inference on three nodes respectively.
|
||||
|
||||
:::::{tab-set}
|
||||
:sync-group: nodes
|
||||
:::::{tab-set}
|
||||
:sync-group: nodes
|
||||
|
||||
::::{tab-item} Prefiller node 1
|
||||
:sync: prefill node1
|
||||
::::{tab-item} Prefiller node 1
|
||||
:sync: prefill node1
|
||||
|
||||
```shell
|
||||
nic_name="eth0" # network card name
|
||||
local_ip="192.0.0.1"
|
||||
master_addr="192.0.0.1"
|
||||
export HCCL_IF_IP=$local_ip
|
||||
export GLOO_SOCKET_IFNAME=$nic_name
|
||||
export TP_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_BUFFSIZE=768
|
||||
export OMP_PROC_BIND=false
|
||||
export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
|
||||
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
|
||||
export OMP_NUM_THREADS=1
|
||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||
export VLLM_USE_V1=1
|
||||
export TASK_QUEUE_ENABLE=1
|
||||
export VLLM_ASCEND_ENABLE_CONTEXT_PARALLEL=1
|
||||
```shell
|
||||
nic_name="eth0" # network card name
|
||||
local_ip="192.0.0.1"
|
||||
master_addr="192.0.0.1"
|
||||
export HCCL_IF_IP=$local_ip
|
||||
export GLOO_SOCKET_IFNAME=$nic_name
|
||||
export TP_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_BUFFSIZE=768
|
||||
export OMP_PROC_BIND=false
|
||||
export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
|
||||
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
|
||||
export OMP_NUM_THREADS=1
|
||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||
export VLLM_USE_V1=1
|
||||
export TASK_QUEUE_ENABLE=1
|
||||
export VLLM_ASCEND_ENABLE_CONTEXT_PARALLEL=1
|
||||
|
||||
vllm serve /path_to_weight/DeepSeek-V3.1_w8a8mix_mtp \
|
||||
--host 0.0.0.0 \
|
||||
--port 8004 \
|
||||
--decode-context-parallel-size 8 \
|
||||
--prefill-context-parallel-size 2 \
|
||||
--cp-kv-cache-interleave-size 128 \
|
||||
--tensor-parallel-size 16 \
|
||||
--enable-expert-parallel \
|
||||
--quantization ascend \
|
||||
--enforce-eager \
|
||||
--served-model-name deepseek_v3 \
|
||||
--seed 1024 \
|
||||
--no-enable-chunked-prefill \
|
||||
--no-enable-prefix-caching \
|
||||
--max-num-seqs 1 \
|
||||
--max-model-len 136000 \
|
||||
--max-num-batched-tokens 136000 \
|
||||
--block-size 128 \
|
||||
--trust-remote-code \
|
||||
--gpu-memory-utilization 0.8 \
|
||||
--nnodes 2 \
|
||||
--node-rank 0 \
|
||||
--master-addr $master_addr \
|
||||
--master-port 7001 \
|
||||
--speculative-config '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}' \
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "MooncakeConnectorV1",
|
||||
"kv_role": "kv_producer",
|
||||
"kv_port": "30000",
|
||||
"engine_id": "0",
|
||||
"kv_connector_extra_config": {
|
||||
"prefill": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
},
|
||||
"decode": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
vllm serve /path_to_weight/DeepSeek-V3.1_w8a8mix_mtp \
|
||||
--host 0.0.0.0 \
|
||||
--port 8004 \
|
||||
--decode-context-parallel-size 8 \
|
||||
--prefill-context-parallel-size 2 \
|
||||
--cp-kv-cache-interleave-size 128 \
|
||||
--tensor-parallel-size 16 \
|
||||
--enable-expert-parallel \
|
||||
--quantization ascend \
|
||||
--enforce-eager \
|
||||
--served-model-name deepseek_v3 \
|
||||
--seed 1024 \
|
||||
--no-enable-chunked-prefill \
|
||||
--no-enable-prefix-caching \
|
||||
--max-num-seqs 1 \
|
||||
--max-model-len 136000 \
|
||||
--max-num-batched-tokens 136000 \
|
||||
--block-size 128 \
|
||||
--trust-remote-code \
|
||||
--gpu-memory-utilization 0.8 \
|
||||
--nnodes 2 \
|
||||
--node-rank 0 \
|
||||
--master-addr $master_addr \
|
||||
--master-port 7001 \
|
||||
--speculative-config '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}' \
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "MooncakeConnectorV1",
|
||||
"kv_role": "kv_producer",
|
||||
"kv_port": "30000",
|
||||
"engine_id": "0",
|
||||
"kv_connector_extra_config": {
|
||||
"use_ascend_direct": true,
|
||||
"prefill": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
},
|
||||
"decode": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
::::
|
||||
::::
|
||||
|
||||
::::{tab-item} Prefiller node 2
|
||||
:sync: prefill node2
|
||||
::::{tab-item} Prefiller node 2
|
||||
:sync: prefill node2
|
||||
|
||||
```shell
|
||||
nic_name="eth0" # network card name
|
||||
local_ip="192.0.0.2"
|
||||
master_addr="192.0.0.1"
|
||||
export HCCL_IF_IP=$local_ip
|
||||
export GLOO_SOCKET_IFNAME=$nic_name
|
||||
export TP_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_BUFFSIZE=768
|
||||
export OMP_PROC_BIND=false
|
||||
export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
|
||||
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
|
||||
export OMP_NUM_THREADS=1
|
||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||
export VLLM_USE_V1=1
|
||||
export TASK_QUEUE_ENABLE=1
|
||||
export VLLM_ASCEND_ENABLE_CONTEXT_PARALLEL=1
|
||||
```shell
|
||||
nic_name="eth0" # network card name
|
||||
local_ip="192.0.0.2"
|
||||
master_addr="192.0.0.1"
|
||||
export HCCL_IF_IP=$local_ip
|
||||
export GLOO_SOCKET_IFNAME=$nic_name
|
||||
export TP_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_BUFFSIZE=768
|
||||
export OMP_PROC_BIND=false
|
||||
export VLLM_ASCEND_ENABLE_FLASHCOMM1=1
|
||||
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
|
||||
export OMP_NUM_THREADS=1
|
||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||
export VLLM_USE_V1=1
|
||||
export TASK_QUEUE_ENABLE=1
|
||||
export VLLM_ASCEND_ENABLE_CONTEXT_PARALLEL=1
|
||||
|
||||
vllm serve /path_to_weight/DeepSeek-V3.1_w8a8mix_mtp \
|
||||
--host 0.0.0.0 \
|
||||
--port 8004 \
|
||||
--decode-context-parallel-size 8 \
|
||||
--prefill-context-parallel-size 2 \
|
||||
--cp-kv-cache-interleave-size 128 \
|
||||
--tensor-parallel-size 16 \
|
||||
--enable-expert-parallel \
|
||||
--quantization ascend \
|
||||
--enforce-eager \
|
||||
--served-model-name deepseek_v3 \
|
||||
--seed 1024 \
|
||||
--no-enable-chunked-prefill \
|
||||
--no-enable-prefix-caching \
|
||||
--max-num-seqs 1 \
|
||||
--max-model-len 136000 \
|
||||
--max-num-batched-tokens 136000 \
|
||||
--block-size 128 \
|
||||
--trust-remote-code \
|
||||
--gpu-memory-utilization 0.8 \
|
||||
--nnodes 2 \
|
||||
--node-rank 1 \
|
||||
--headless \
|
||||
--master-addr $master_addr \
|
||||
--master-port 7001 \
|
||||
--speculative-config '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}' \
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "MooncakeConnectorV1",
|
||||
"kv_role": "kv_producer",
|
||||
"kv_port": "30000",
|
||||
"engine_id": "1",
|
||||
"kv_connector_extra_config": {
|
||||
"prefill": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
},
|
||||
"decode": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
vllm serve /path_to_weight/DeepSeek-V3.1_w8a8mix_mtp \
|
||||
--host 0.0.0.0 \
|
||||
--port 8004 \
|
||||
--decode-context-parallel-size 8 \
|
||||
--prefill-context-parallel-size 2 \
|
||||
--cp-kv-cache-interleave-size 128 \
|
||||
--tensor-parallel-size 16 \
|
||||
--enable-expert-parallel \
|
||||
--quantization ascend \
|
||||
--enforce-eager \
|
||||
--served-model-name deepseek_v3 \
|
||||
--seed 1024 \
|
||||
--no-enable-chunked-prefill \
|
||||
--no-enable-prefix-caching \
|
||||
--max-num-seqs 1 \
|
||||
--max-model-len 136000 \
|
||||
--max-num-batched-tokens 136000 \
|
||||
--block-size 128 \
|
||||
--trust-remote-code \
|
||||
--gpu-memory-utilization 0.8 \
|
||||
--nnodes 2 \
|
||||
--node-rank 1 \
|
||||
--headless \
|
||||
--master-addr $master_addr \
|
||||
--master-port 7001 \
|
||||
--speculative-config '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}' \
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "MooncakeConnectorV1",
|
||||
"kv_role": "kv_producer",
|
||||
"kv_port": "30000",
|
||||
"engine_id": "1",
|
||||
"kv_connector_extra_config": {
|
||||
"use_ascend_direct": true,
|
||||
"prefill": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
},
|
||||
"decode": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
::::
|
||||
::::
|
||||
|
||||
::::{tab-item} Decoder node 1
|
||||
:sync: decoder node1
|
||||
::::{tab-item} Decoder node 1
|
||||
:sync: decoder node1
|
||||
|
||||
```shell
|
||||
nic_name="eth0" # network card name
|
||||
local_ip="192.0.0.3"
|
||||
export HCCL_IF_IP=$local_ip
|
||||
export GLOO_SOCKET_IFNAME=$nic_name
|
||||
export TP_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_BUFFSIZE=768
|
||||
export OMP_PROC_BIND=false
|
||||
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
|
||||
export OMP_NUM_THREADS=1
|
||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||
export VLLM_USE_V1=1
|
||||
export TASK_QUEUE_ENABLE=1
|
||||
export VLLM_ASCEND_ENABLE_CONTEXT_PARALLEL=1
|
||||
```shell
|
||||
nic_name="eth0" # network card name
|
||||
local_ip="192.0.0.3"
|
||||
export HCCL_IF_IP=$local_ip
|
||||
export GLOO_SOCKET_IFNAME=$nic_name
|
||||
export TP_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_SOCKET_IFNAME=$nic_name
|
||||
export HCCL_BUFFSIZE=768
|
||||
export OMP_PROC_BIND=false
|
||||
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
|
||||
export OMP_NUM_THREADS=1
|
||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||
export VLLM_USE_V1=1
|
||||
export TASK_QUEUE_ENABLE=1
|
||||
export VLLM_ASCEND_ENABLE_CONTEXT_PARALLEL=1
|
||||
|
||||
vllm serve /path_to_weight/DeepSeek-V3.1_w8a8mix_mtp \
|
||||
--host 0.0.0.0 \
|
||||
--port 8004 \
|
||||
--api-server-count 1 \
|
||||
--data-parallel-size 1 \
|
||||
--data-parallel-size-local 1 \
|
||||
--data-parallel-start-rank 0 \
|
||||
--data-parallel-address $local_ip \
|
||||
--data-parallel-rpc-port 5980 \
|
||||
--decode-context-parallel-size 1 \
|
||||
--tensor-parallel-size 16 \
|
||||
--enable-expert-parallel \
|
||||
--quantization ascend \
|
||||
--no-enable-prefix-caching \
|
||||
--distributed-executor-backend mp \
|
||||
--served-model-name deepseek_v3 \
|
||||
--seed 1024 \
|
||||
--max-model-len 136000 \
|
||||
--max-num-batched-tokens 128 \
|
||||
--enable-chunked-prefill \
|
||||
--max-num-seqs 4 \
|
||||
--trust-remote-code \
|
||||
--gpu-memory-utilization 0.96 \
|
||||
--speculative-config '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}' \
|
||||
--compilation_config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes":[1,2,4]}' \
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "MooncakeConnectorV1",
|
||||
"kv_role": "kv_consumer",
|
||||
"kv_port": "30200",
|
||||
"engine_id": "3",
|
||||
"kv_connector_extra_config": {
|
||||
"prefill": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
},
|
||||
"decode": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
vllm serve /path_to_weight/DeepSeek-V3.1_w8a8mix_mtp \
|
||||
--host 0.0.0.0 \
|
||||
--port 8004 \
|
||||
--api-server-count 1 \
|
||||
--data-parallel-size 1 \
|
||||
--data-parallel-size-local 1 \
|
||||
--data-parallel-start-rank 0 \
|
||||
--data-parallel-address $local_ip \
|
||||
--data-parallel-rpc-port 5980 \
|
||||
--decode-context-parallel-size 1 \
|
||||
--tensor-parallel-size 16 \
|
||||
--enable-expert-parallel \
|
||||
--quantization ascend \
|
||||
--no-enable-prefix-caching \
|
||||
--distributed-executor-backend mp \
|
||||
--served-model-name deepseek_v3 \
|
||||
--seed 1024 \
|
||||
--max-model-len 136000 \
|
||||
--max-num-batched-tokens 128 \
|
||||
--enable-chunked-prefill \
|
||||
--max-num-seqs 4 \
|
||||
--trust-remote-code \
|
||||
--gpu-memory-utilization 0.96 \
|
||||
--speculative-config '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}' \
|
||||
--compilation_config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes":[1,2,4]}' \
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector": "MooncakeConnectorV1",
|
||||
"kv_role": "kv_consumer",
|
||||
"kv_port": "30200",
|
||||
"engine_id": "3",
|
||||
"kv_connector_extra_config": {
|
||||
"prefill": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
},
|
||||
"decode": {
|
||||
"dp_size": 1,
|
||||
"tp_size": 16
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
::::
|
||||
::::
|
||||
|
||||
:::::
|
||||
:::::
|
||||
|
||||
2. Prefill master node `proxy.sh` script
|
||||
|
||||
```shell
|
||||
python load_balance_proxy_server_example.py \
|
||||
--port 8005 \
|
||||
--host 192.0.0.1 \
|
||||
--prefiller-hosts \
|
||||
192.0.0.1 \
|
||||
--prefiller-ports \
|
||||
8004 \
|
||||
--decoder-hosts \
|
||||
192.0.0.3 \
|
||||
--decoder-ports \
|
||||
8004
|
||||
```
|
||||
```shell
|
||||
python load_balance_proxy_server_example.py \
|
||||
--port 8005 \
|
||||
--host 192.0.0.1 \
|
||||
--prefiller-hosts \
|
||||
192.0.0.1 \
|
||||
--prefiller-ports \
|
||||
8004 \
|
||||
--decoder-hosts \
|
||||
192.0.0.3 \
|
||||
--decoder-ports \
|
||||
8004
|
||||
```
|
||||
|
||||
3. Run proxy
|
||||
|
||||
@@ -347,7 +349,7 @@ Refer to [Using AISBench for performance evaluation](../../developer_guide/evalu
|
||||
|
||||
Run performance evaluation of `DeepSeek-V3.1-w8a8` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/benchmarking/) for more details.
|
||||
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
|
||||
@@ -157,7 +157,7 @@ Refer to [Using AISBench for performance evaluation](../../developer_guide/evalu
|
||||
|
||||
Run performance evaluation of `Qwen3-235B-A22B-w8a8` as an example.
|
||||
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) for more details.
|
||||
Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/benchmarking/) for more details.
|
||||
|
||||
There are three `vllm bench` subcommands:
|
||||
|
||||
|
||||
@@ -22,55 +22,55 @@ Execute the following commands on each node in sequence. The results must all be
|
||||
|
||||
1. Single Node Verification:
|
||||
|
||||
Execute the following commands on each node in sequence. The results must all be `success` and the status must be `UP`:
|
||||
Execute the following commands on each node in sequence. The results must all be `success` and the status must be `UP`:
|
||||
|
||||
```bash
|
||||
# Check the remote switch ports
|
||||
for i in {0..15}; do hccn_tool -i $i -lldp -g | grep Ifname; done
|
||||
# Get the link status of the Ethernet ports (UP or DOWN)
|
||||
for i in {0..15}; do hccn_tool -i $i -link -g ; done
|
||||
# Check the network health status
|
||||
for i in {0..15}; do hccn_tool -i $i -net_health -g ; done
|
||||
# View the network detected IP configuration
|
||||
for i in {0..15}; do hccn_tool -i $i -netdetect -g ; done
|
||||
# View gateway configuration
|
||||
for i in {0..15}; do hccn_tool -i $i -gateway -g ; done
|
||||
```
|
||||
```bash
|
||||
# Check the remote switch ports
|
||||
for i in {0..15}; do hccn_tool -i $i -lldp -g | grep Ifname; done
|
||||
# Get the link status of the Ethernet ports (UP or DOWN)
|
||||
for i in {0..15}; do hccn_tool -i $i -link -g ; done
|
||||
# Check the network health status
|
||||
for i in {0..15}; do hccn_tool -i $i -net_health -g ; done
|
||||
# View the network detected IP configuration
|
||||
for i in {0..15}; do hccn_tool -i $i -netdetect -g ; done
|
||||
# View gateway configuration
|
||||
for i in {0..15}; do hccn_tool -i $i -gateway -g ; done
|
||||
```
|
||||
|
||||
2. Check NPU HCCN Configuration:
|
||||
|
||||
Ensure that the hccn.conf file exists in the environment. If using Docker, mount it into the container.
|
||||
Ensure that the hccn.conf file exists in the environment. If using Docker, mount it into the container.
|
||||
|
||||
```bash
|
||||
cat /etc/hccn.conf
|
||||
```
|
||||
```bash
|
||||
cat /etc/hccn.conf
|
||||
```
|
||||
|
||||
3. Get NPU IP Addresses
|
||||
|
||||
```bash
|
||||
# Get virtual NPU IP.
|
||||
for i in {0..15}; do hccn_tool -i $i -vnic -g;done
|
||||
```
|
||||
```bash
|
||||
# Get virtual NPU IP.
|
||||
for i in {0..15}; do hccn_tool -i $i -vnic -g;done
|
||||
```
|
||||
|
||||
4. Get superpodid and SDID
|
||||
|
||||
```bash
|
||||
for i in {0..15}; do npu-smi info -t spod-info -i $i -c 0;npu-smi info -t spod-info -i $i -c 1;done
|
||||
```
|
||||
```bash
|
||||
for i in {0..15}; do npu-smi info -t spod-info -i $i -c 0;npu-smi info -t spod-info -i $i -c 1;done
|
||||
```
|
||||
|
||||
5. Cross-Node PING Test
|
||||
|
||||
```bash
|
||||
# Execute on the target node (replace 'x.x.x.x' with virtual NPU IP address).
|
||||
for i in {0..15}; do hccn_tool -i $i -hccs_ping -g address x.x.x.x;done
|
||||
```
|
||||
```bash
|
||||
# Execute on the target node (replace 'x.x.x.x' with virtual NPU IP address).
|
||||
for i in {0..15}; do hccn_tool -i $i -hccs_ping -g address x.x.x.x;done
|
||||
```
|
||||
|
||||
6. Check NPU TLS Configuration
|
||||
|
||||
```bash
|
||||
# The TLS settings should be consistent across all nodes
|
||||
for i in {0..15}; do hccn_tool -i $i -tls -g ; done | grep switch
|
||||
```
|
||||
```bash
|
||||
# The TLS settings should be consistent across all nodes
|
||||
for i in {0..15}; do hccn_tool -i $i -tls -g ; done | grep switch
|
||||
```
|
||||
|
||||
::::
|
||||
|
||||
@@ -78,48 +78,48 @@ for i in {0..15}; do hccn_tool -i $i -tls -g ; done | grep switch
|
||||
|
||||
1. Single Node Verification:
|
||||
|
||||
Execute the following commands on each node in sequence. The results must all be `success` and the status must be `UP`:
|
||||
Execute the following commands on each node in sequence. The results must all be `success` and the status must be `UP`:
|
||||
|
||||
```bash
|
||||
# Check the remote switch ports
|
||||
for i in {0..7}; do hccn_tool -i $i -lldp -g | grep Ifname; done
|
||||
# Get the link status of the Ethernet ports (UP or DOWN)
|
||||
for i in {0..7}; do hccn_tool -i $i -link -g ; done
|
||||
# Check the network health status
|
||||
for i in {0..7}; do hccn_tool -i $i -net_health -g ; done
|
||||
# View the network detected IP configuration
|
||||
for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done
|
||||
# View gateway configuration
|
||||
for i in {0..7}; do hccn_tool -i $i -gateway -g ; done
|
||||
```
|
||||
```bash
|
||||
# Check the remote switch ports
|
||||
for i in {0..7}; do hccn_tool -i $i -lldp -g | grep Ifname; done
|
||||
# Get the link status of the Ethernet ports (UP or DOWN)
|
||||
for i in {0..7}; do hccn_tool -i $i -link -g ; done
|
||||
# Check the network health status
|
||||
for i in {0..7}; do hccn_tool -i $i -net_health -g ; done
|
||||
# View the network detected IP configuration
|
||||
for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done
|
||||
# View gateway configuration
|
||||
for i in {0..7}; do hccn_tool -i $i -gateway -g ; done
|
||||
```
|
||||
|
||||
2. Check NPU HCCN Configuration:
|
||||
|
||||
Ensure that the hccn.conf file exists in the environment. If using Docker, mount it into the container.
|
||||
Ensure that the hccn.conf file exists in the environment. If using Docker, mount it into the container.
|
||||
|
||||
```bash
|
||||
cat /etc/hccn.conf
|
||||
```
|
||||
```bash
|
||||
cat /etc/hccn.conf
|
||||
```
|
||||
|
||||
3. Get NPU IP Addresses
|
||||
|
||||
```bash
|
||||
for i in {0..7}; do hccn_tool -i $i -ip -g;done
|
||||
```
|
||||
```bash
|
||||
for i in {0..7}; do hccn_tool -i $i -ip -g;done
|
||||
```
|
||||
|
||||
4. Cross-Node PING Test
|
||||
|
||||
```bash
|
||||
# Execute on the target node (replace 'x.x.x.x' with actual npu ip address)
|
||||
for i in {0..7}; do hccn_tool -i $i -ping -g address x.x.x.x;done
|
||||
```
|
||||
```bash
|
||||
# Execute on the target node (replace 'x.x.x.x' with actual npu ip address)
|
||||
for i in {0..7}; do hccn_tool -i $i -ping -g address x.x.x.x;done
|
||||
```
|
||||
|
||||
5. Check NPU TLS Configuration
|
||||
|
||||
```bash
|
||||
# The TLS settings should be consistent across all nodes
|
||||
for i in {0..7}; do hccn_tool -i $i -tls -g ; done | grep switch
|
||||
```
|
||||
```bash
|
||||
# The TLS settings should be consistent across all nodes
|
||||
for i in {0..7}; do hccn_tool -i $i -tls -g ; done | grep switch
|
||||
```
|
||||
|
||||
::::
|
||||
|
||||
|
||||
@@ -12,48 +12,48 @@ Using the Qwen2.5-VL-7B-Instruct model as an example, use vllm-ascend v0.11.0rc1
|
||||
|
||||
1. Single Node Verification:
|
||||
|
||||
Execute the following commands in sequence. The results must all be `success` and the status must be `UP`:
|
||||
Execute the following commands in sequence. The results must all be `success` and the status must be `UP`:
|
||||
|
||||
```bash
|
||||
# Check the remote switch ports
|
||||
for i in {0..7}; do hccn_tool -i $i -lldp -g | grep Ifname; done
|
||||
# Get the link status of the Ethernet ports (UP or DOWN)
|
||||
for i in {0..7}; do hccn_tool -i $i -link -g ; done
|
||||
# Check the network health status
|
||||
for i in {0..7}; do hccn_tool -i $i -net_health -g ; done
|
||||
# View the network detected IP configuration
|
||||
for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done
|
||||
# View gateway configuration
|
||||
for i in {0..7}; do hccn_tool -i $i -gateway -g ; done
|
||||
```
|
||||
```bash
|
||||
# Check the remote switch ports
|
||||
for i in {0..7}; do hccn_tool -i $i -lldp -g | grep Ifname; done
|
||||
# Get the link status of the Ethernet ports (UP or DOWN)
|
||||
for i in {0..7}; do hccn_tool -i $i -link -g ; done
|
||||
# Check the network health status
|
||||
for i in {0..7}; do hccn_tool -i $i -net_health -g ; done
|
||||
# View the network detected IP configuration
|
||||
for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done
|
||||
# View gateway configuration
|
||||
for i in {0..7}; do hccn_tool -i $i -gateway -g ; done
|
||||
```
|
||||
|
||||
2. Check NPU HCCN Configuration:
|
||||
|
||||
Ensure that the hccn.conf file exists in the environment. If using Docker, mount it into the container.
|
||||
Ensure that the hccn.conf file exists in the environment. If using Docker, mount it into the container.
|
||||
|
||||
```bash
|
||||
cat /etc/hccn.conf
|
||||
```
|
||||
```bash
|
||||
cat /etc/hccn.conf
|
||||
```
|
||||
|
||||
3. Get NPU IP Addresses
|
||||
|
||||
```bash
|
||||
for i in {0..7}; do hccn_tool -i $i -ip -g;done
|
||||
```
|
||||
```bash
|
||||
for i in {0..7}; do hccn_tool -i $i -ip -g;done
|
||||
```
|
||||
|
||||
4. Cross-Node PING Test
|
||||
|
||||
```bash
|
||||
# Execute on the target node (replace 'x.x.x.x' with actual npu ip address).
|
||||
for i in {0..7}; do hccn_tool -i $i -ping -g address x.x.x.x;done
|
||||
```
|
||||
```bash
|
||||
# Execute on the target node (replace 'x.x.x.x' with actual npu ip address).
|
||||
for i in {0..7}; do hccn_tool -i $i -ping -g address x.x.x.x;done
|
||||
```
|
||||
|
||||
5. Check NPU TLS Configuration
|
||||
|
||||
```bash
|
||||
# The tls settings should be consistent across all nodes
|
||||
for i in {0..7}; do hccn_tool -i $i -tls -g ; done | grep switch
|
||||
```
|
||||
```bash
|
||||
# The tls settings should be consistent across all nodes
|
||||
for i in {0..7}; do hccn_tool -i $i -tls -g ; done | grep switch
|
||||
```
|
||||
|
||||
## Run with Docker
|
||||
|
||||
|
||||
@@ -86,7 +86,8 @@ docker run --rm \
|
||||
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
|
||||
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
|
||||
-v /etc/ascend_install.info:/etc/ascend_install.info \
|
||||
-v /path/to/shared/cache:/root/.cache \ # IMPORTANT: This must be a shared directory accessible by all nodes
|
||||
# IMPORTANT: This must be a shared directory accessible by all nodes
|
||||
-v /path/to/shared/cache:/root/.cache \
|
||||
-it $IMAGE bash
|
||||
```
|
||||
|
||||
|
||||
@@ -12,8 +12,8 @@ This document provides step-by-step guidance on how to deploy and benchmark the
|
||||
| Common Sense Reasoning | ARC |
|
||||
| Mathematical Reasoning | gsm8k |
|
||||
| Natural Language Understanding | SuperGLUE_BoolQ |
|
||||
| Comprehensive Examination | agieval |
|
||||
| Multi-turn Dialogue | sharegpt |
|
||||
| Comprehensive Examination | AGIEval |
|
||||
| Multi-turn Dialogue | ShareGPT |
|
||||
|
||||
The benchmarking tool used in this tutorial is AISBench, which supports performance testing for all the datasets listed above. The final section of this tutorial presents a performance comparison between enabling and disabling Suffix Decoding under the condition of satisfying an SLO TPOT < 50ms across different datasets and concurrency levels. Validations demonstrate that the Qwen3-32B model achieves a throughput improvement of approximately 20% to 80% on various real-world datasets when Suffix Decoding is enabled.
|
||||
|
||||
@@ -171,7 +171,7 @@ Below is the raw detailed test results:
|
||||
| 1 | 207 | 314 | 100 | 54.1 | 18.4 | 36.1 | 26.8 | 33.4% | 49.8% | 45.6% |
|
||||
| 16 | 207 | 314 | 100 | 60.0 | 229.7 | 43.5 | 303.9 | 33.4% | 38.0% | 32.3% |
|
||||
| 32 | 207 | 314 | 100 | 62.7 | 396.4 | 47.8 | 507.5 | 33.4% | 31.3% | 28.0% |
|
||||
| **Agieval** | | | | | | | | | | |
|
||||
| **AGIEval** | | | | | | | | | | |
|
||||
| 1 | 735 | 1880 | 100 | 53.1 | 18.7 | 31.8 | 34.1 | 50.3% | 66.8% | 81.9% |
|
||||
| 24 | 735 | 1880 | 100 | 64.0 | 381.2 | 43.3 | 629.0 | 50.3% | 47.8% | 65.0% |
|
||||
| 34 | 735 | 1880 | 100 | 70.0 | 494.6 | 50.2 | 768.4 | 50.3% | 39.4% | 55.3% |
|
||||
|
||||
Reference in New Issue
Block a user