[Doc][Misc] Improve readability and fix typos in documentation (#8340)
### What this PR does / why we need it? This PR improves the readability of the documentation by fixing typos, correcting command extensions, and fixing broken links in the Chinese README. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Documentation changes only. --------- Signed-off-by: sunshine202600 <sunshine202600@163.com>
This commit is contained in:
@@ -71,10 +71,10 @@ vllm serve "/xxxxx/DeepSeek-V2-Lite-Chat" \
|
||||
`HCCL_EXEC_TIMEOUT`, `HCCL_CONNECT_TIMEOUT`, and `HCCL_IF_IP` are hccl-related configurations.<br>
|
||||
Set `GLOO_SOCKET_IFNAME`, `TP_SOCKET_IFNAME`, and `HCCL_SOCKET_IFNAME` to the corresponding NIC.<br>
|
||||
`ASCEND_RT_VISIBLE_DEVICES` specifies the cards on which the node run resides. The total number of cards equals `dp_size*tp_size`.<br>
|
||||
`/xxxxx/DeepSeek-V2-Lite-Chat` is configured as a model that requires run.<br>
|
||||
`/xxxxx/DeepSeek-V2-Lite-Chat` is configured as a model that requires running.<br>
|
||||
`--host`: indicates the IP address of the node to be started.<br>
|
||||
`--port`: indicates the port to be started, which corresponds to the port in step 4.<br>
|
||||
`--seed`, --max-model-len, and --max-num-batched-tokens model basic configuration. Set this parameter based on the site requirements.<br>
|
||||
`--port`: indicates the port on which the prefill node will listen (e.g., 8100). This port is later referenced in step 3 when configuring the proxy server.<br>
|
||||
`--seed`: `--max-model-len`, and `--max-num-batched-tokens` are part of the model's basic configuration. Set this parameter based on the site requirements.<br>
|
||||
`--tensor-parallel-size`: specifies the TP size.<br>
|
||||
`--data-parallel-size`: indicates the DP size.<br>
|
||||
`--data-parallel-address`: indicates the IP address of the DP. Set this parameter to the IP address of the node.--data-parallel-rpc-port: indicates the RPC port for communication in the DP group.<br>
|
||||
@@ -144,7 +144,7 @@ python load_balance_proxy_server_example.py --host localhost --prefiller-hosts h
|
||||
|
||||
`--host`: indicates the active node. The value of localhost in the curl command delivered in step 5 must be the same as the host. The default port number for starting the service proxy is 8000.<br>
|
||||
`--prefiller-hosts`: Set this parameter to the IP addresses of all p nodes. In the xpyd scenario, add the IP addresses to the end of this configuration item and leave a blank space between the IP addresses.<br>
|
||||
`--prefiller-ports`: Set this parameter to the port number of all p nodes, which is the configuration of the port number for the vllm to start the service in step 3. Write the port number after the configuration in sequence and leave a blank space between the port number and the port number. The sequence must be one-to-one mapping to the IP address of --prefiller-hosts.<br>
|
||||
`--prefiller-ports`: Set this parameter to the port numbers of all prefill (P) nodes, which were defined in step 1 when starting the prefill nodes. Write the port number after the configuration in sequence and leave a blank space between the port number and the port number. The sequence must be one-to-one mapping to the IP address of --prefiller-hosts.<br>
|
||||
`--decoder-hosts`: Set this parameter to the IP addresses of all d nodes. In the xpyd scenario, add the IP addresses to the end of this configuration item and leave a blank space between the IP addresses.<br>
|
||||
`--decoder-ports`: Set this parameter to the port number of all d nodes, which is the configuration of the port number for the vllm to start the service in step 4. Set port to the end of the configuration, and leave a blank space between port and port. The sequence must be one-to-one mapping to the IP address of --decoder-hosts.<br>
|
||||
|
||||
|
||||
@@ -11,7 +11,7 @@
|
||||
|
||||
## run
|
||||
|
||||
The EPD disaggregated technology accelerates model inference by decoupling the visual encoding computation and LLM computation stages. Currently, the EPD separation feature can achieve different data transmissions between E and P/PD nodes by configuring different connector backends. Vllm-ascend currently supports the ECexample-connector backend implemented on vllm, and will support Mooncake as well as shared memory(SHM) backend transmission methods in the future.
|
||||
The EPD disaggregated technology accelerates model inference by decoupling the visual encoding computation and LLM computation stages. Currently, the EPD separation feature can achieve different data transmissions between E and P/PD nodes by configuring different connector backends. Vllm-ascend currently supports the ECExampleConnector backend implemented on vllm, and will support Mooncake as well as shared memory(SHM) backend transmission methods in the future.
|
||||
|
||||
### ECexample-connector deployment guide
|
||||
|
||||
@@ -114,7 +114,7 @@ python3 epd_load_balance_proxy_layerwise_server_example.py \
|
||||
--port 8001
|
||||
```
|
||||
|
||||
TODO: explain the param.<br>
|
||||
The parameters are explained as follows:<br>
|
||||
`--encoder-hosts`: E node IP address.<br>
|
||||
`--encoder-ports`: The E node port number. It needs to be consistent with the --port in the E node's startup script.<br>
|
||||
`--pd-hosts`: PD node IP address.<br>
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
Here is an example guiding how to use `launch_online_dp.py` to launch external dp vllm servers. User can easily launch external dp servers following the steps below:
|
||||
Here is an example guiding how to use `launch_online_dp.py` to launch external dp vLLM servers. User can easily launch external dp servers following the steps below:
|
||||
|
||||
### Modify parameters in `run_dp_template.sh`
|
||||
|
||||
`run_dp_template.sh` is an template script used to launch each dp vllm instance separately. It will be called by `launch_online_dp.py` in multi threads and most of its configurations are set by `launch_online_dp.py`. Parameters you need to set manually include:
|
||||
`run_dp_template.sh` is a template script used to launch each data parallel (dp) vLLM instance separately. It will be called by `launch_online_dp.py` in multiple threads and most of its configurations are set by `launch_online_dp.py`. Parameters you need to set manually include:
|
||||
|
||||
1. The IP and socket_ifname of your machine. If running on multi-nodes, please make sure the scripts on each node has been set with correct IP and socket_ifname of that node.
|
||||
2. vLLM serving related parameters including model_path and other configurations. Note that port, dp-related parameters and tp_size is set by `launch_online_dp.py`, all the other vLLM parameters in this file only serve as an example and you are free to modify them according to your purpose.
|
||||
@@ -42,7 +42,7 @@ python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 2 --dp-rank-s
|
||||
|
||||
### (Optional) Run `dp_load_balance_proxy_server.py` to load balance requests between external dp servers
|
||||
|
||||
External dp server means that you need to handle load balance between multiple dp instances out of vllm by implementing your custom proxy server. Here we provide an example of request-length-aware dp load-balance proxy server for you. The arguments of `dp_load_balance_proxy_server.py` include:
|
||||
External dp server means that you need to handle load balance between multiple dp instances out of vLLM by implementing your custom proxy server. Here we provide an example of request-length-aware dp load-balance proxy server for you. The arguments of `dp_load_balance_proxy_server.py` include:
|
||||
|
||||
1. `--port`: port of proxy server, default 8000
|
||||
2. `--host`: host address of proxy server, default localhost
|
||||
|
||||
Reference in New Issue
Block a user