[main][Docs] Fix typos across documentation (#6728)
## Summary
Fix typos and improve grammar consistency across 50 documentation files.
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
9562912cea
---------
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# External DP
|
||||
|
||||
For larger scale deployments especially, it can make sense to handle the orchestration and load balancing of data parallel ranks externally.
|
||||
For larger-scale deployments especially, it can make sense to handle the orchestration and load balancing of data parallel ranks externally.
|
||||
|
||||
In this case, it's more convenient to treat each DP rank like a separate vLLM deployment, with its own endpoint, and have an external router balance HTTP requests between them, making use of appropriate real-time telemetry from each server for routing decisions.
|
||||
|
||||
@@ -8,8 +8,8 @@ In this case, it's more convenient to treat each DP rank like a separate vLLM de
|
||||
|
||||
The functionality of [external DP](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment/?h=external#external-load-balancing) is already natively supported by vLLM. In vllm-ascend we provide two enhanced functionalities:
|
||||
|
||||
1. A launch script which helps to launch multi vllm instances in one command.
|
||||
2. A request-length-aware load balance proxy for external dp.
|
||||
1. A launch script that helps to launch multiple vLLM instances in one command.
|
||||
2. A request-length-aware load-balance proxy for external DP.
|
||||
|
||||
This tutorial will introduce the usage of them.
|
||||
|
||||
@@ -24,9 +24,9 @@ pip install fastapi httpx uvicorn
|
||||
|
||||
## Starting Exeternal DP Servers
|
||||
|
||||
First you need to have at least two vLLM servers running in data parallel. These can be mock servers or actual vLLM servers. Note that this proxy also works with only one vLLM server running, but will fall back to direct request forwarding which is meaningless.
|
||||
First, you need to have at least two vLLM servers running in data parallel. These can be mock servers or actual vLLM servers. Note that this proxy also works with only one vLLM server running, but will fall back to direct request forwarding which is meaningless.
|
||||
|
||||
You can start external vLLM dp servers one-by-one manually or using the launch script in `examples/external_online_dp`. For scenarios of large dp size across multi nodes, we recommend using our launch script for convenience.
|
||||
You can start external vLLM DP servers one-by-one manually or using the launch script in `examples/external_online_dp`. For scenarios of large DP size across multiple nodes, we recommend using our launch script for convenience.
|
||||
|
||||
### Manually Launch
|
||||
|
||||
@@ -38,7 +38,7 @@ vllm serve --host 0.0.0.0 --port 8101 --data-parallel-size 2 --data-parallel-ran
|
||||
|
||||
### Use Launch Script
|
||||
|
||||
Firstly, you need to modify the `examples/external_online_dp/run_dp_template.sh` according to your vLLM configuration. Then you can use `examples/external_online_dp/launch_online_dp.py` to launch multiple vLLM instances in one command each node. It will internally call `examples/external_online_dp/run_dp_template.sh` for each DP rank with proper DP-related parameters.
|
||||
Firstly, you need to modify the `examples/external_online_dp/run_dp_template.sh` according to your vLLM configuration. Then you can use `examples/external_online_dp/launch_online_dp.py` to launch multiple vLLM instances in one command on each node. It will internally call `examples/external_online_dp/run_dp_template.sh` for each DP rank with proper DP-related parameters.
|
||||
|
||||
An example of running external DP in one single node:
|
||||
|
||||
@@ -65,9 +65,9 @@ python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 2 --dp-rank-s
|
||||
|
||||
## Starting Load-balance Proxy Server
|
||||
|
||||
After all vLLM DP instances are launched, you can now launch the load-balance proxy server which serves as entrypoint for coming requests and load balance them between vLLM DP instances.
|
||||
After all vLLM DP instances are launched, you can now launch the load-balance proxy server, which serves as an entrypoint for coming requests and load-balances them between vLLM DP instances.
|
||||
|
||||
The proxy server has following features:
|
||||
The proxy server has the following features:
|
||||
|
||||
- Load balances requests to multiple vLLM servers based on request length.
|
||||
- Supports OpenAI-compatible `/v1/completions` and `/v1/chat/completions` endpoints.
|
||||
@@ -88,4 +88,4 @@ python dp_load_balance_proxy_server.py \
|
||||
--dp-ports 9000 9001 \
|
||||
```
|
||||
|
||||
After this, you can directly send requests to the proxy server and run DP with external load-balance.
|
||||
After this, you can directly send requests to the proxy server and run DP with external load balancing.
|
||||
|
||||
Reference in New Issue
Block a user