[v0.11.0][Doc] Update doc (#3852)
### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com>
This commit is contained in:
@@ -2,10 +2,10 @@
|
||||
|
||||
## Verify Multi-Node Communication Environment
|
||||
|
||||
referring to [multi_node.md](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node.html#verification-process)
|
||||
Refer to [multi_node.md](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_node.html#verification-process).
|
||||
|
||||
## Run with docker
|
||||
Assume you have two Atlas 800 A3(64G*16) nodes(or 4 * A2), and want to deploy the `Kimi-K2-Instruct-W8A8` quantitative model across multi-node.
|
||||
## Run with Docker
|
||||
Assume you have two Atlas 800 A3 (64G*16) or four A2 nodes, and want to deploy the `Kimi-K2-Instruct-W8A8` quantitative model across multiple nodes.
|
||||
|
||||
```{code-block} bash
|
||||
:substitutions:
|
||||
@@ -14,7 +14,7 @@ export IMAGE=m.daocloud.io/quay.io/ascend/vllm-ascend:|vllm_ascend_version|
|
||||
export NAME=vllm-ascend
|
||||
|
||||
# Run the container using the defined variables
|
||||
# Note if you are running bridge network with docker, Please expose available ports for multiple nodes communication in advance
|
||||
# Note: If you are running bridge network with docker, please expose available ports for multiple nodes communication in advance
|
||||
docker run --rm \
|
||||
--name $NAME \
|
||||
--net=host \
|
||||
@@ -47,13 +47,13 @@ docker run --rm \
|
||||
-it $IMAGE bash
|
||||
```
|
||||
|
||||
Run the following scripts on two nodes respectively
|
||||
Run the following scripts on two nodes respectively.
|
||||
|
||||
:::{note}
|
||||
Before launch the inference server, ensure the following environment variables are set for multi node communication
|
||||
Before launching the inference server, ensure the following environment variables are set for multi-node communication.
|
||||
:::
|
||||
|
||||
**node0**
|
||||
**Node 0**
|
||||
|
||||
```shell
|
||||
#!/bin/sh
|
||||
@@ -72,8 +72,8 @@ export OMP_NUM_THREADS=100
|
||||
export VLLM_USE_V1=1
|
||||
export HCCL_BUFFSIZE=1024
|
||||
|
||||
# The w8a8 weight can obtained from https://www.modelscope.cn/models/vllm-ascend/Kimi-K2-Instruct-W8A8
|
||||
# If you want to the quantization manually, please refer to https://vllm-ascend.readthedocs.io/en/latest/user_guide/feature_guide/quantization.html
|
||||
# The w8a8 weight can be obtained from https://www.modelscope.cn/models/vllm-ascend/Kimi-K2-Instruct-W8A8
|
||||
# If you want to do the quantization manually, please refer to https://vllm-ascend.readthedocs.io/en/latest/user_guide/feature_guide/quantization.html
|
||||
vllm serve /home/cache/weights/Kimi-K2-Instruct-W8A8 \
|
||||
--host 0.0.0.0 \
|
||||
--port 8004 \
|
||||
@@ -96,7 +96,7 @@ vllm serve /home/cache/weights/Kimi-K2-Instruct-W8A8 \
|
||||
--additional-config '{"ascend_scheduler_config":{"enabled":true},"torchair_graph_config":{"enabled":true}}'
|
||||
```
|
||||
|
||||
**node1**
|
||||
**Node 1**
|
||||
|
||||
```shell
|
||||
#!/bin/sh
|
||||
@@ -141,7 +141,7 @@ vllm serve /home/cache/weights/Kimi-K2-Instruct-W8A8 \
|
||||
--additional-config '{"ascend_scheduler_config":{"enabled":true},"torchair_graph_config":{"enabled":true}}'
|
||||
```
|
||||
|
||||
The Deployment view looks like:
|
||||
The deployment view looks like:
|
||||

|
||||
|
||||
Once your server is started, you can query the model with input prompts:
|
||||
|
||||
Reference in New Issue
Block a user