[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This commit is contained in:
Cao Yi
2026-02-13 15:50:05 +08:00
committed by GitHub
parent b6bc3d2f9d
commit 6de207de88
50 changed files with 273 additions and 272 deletions

View File

@@ -1,6 +1,6 @@
# External DP
For larger scale deployments especially, it can make sense to handle the orchestration and load balancing of data parallel ranks externally.
For larger-scale deployments especially, it can make sense to handle the orchestration and load balancing of data parallel ranks externally.
In this case, it's more convenient to treat each DP rank like a separate vLLM deployment, with its own endpoint, and have an external router balance HTTP requests between them, making use of appropriate real-time telemetry from each server for routing decisions.
@@ -8,8 +8,8 @@ In this case, it's more convenient to treat each DP rank like a separate vLLM de
The functionality of [external DP](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment/?h=external#external-load-balancing) is already natively supported by vLLM. In vllm-ascend we provide two enhanced functionalities:
1. A launch script which helps to launch multi vllm instances in one command.
2. A request-length-aware load balance proxy for external dp.
1. A launch script that helps to launch multiple vLLM instances in one command.
2. A request-length-aware load-balance proxy for external DP.
This tutorial will introduce the usage of them.
@@ -24,9 +24,9 @@ pip install fastapi httpx uvicorn
## Starting Exeternal DP Servers
First you need to have at least two vLLM servers running in data parallel. These can be mock servers or actual vLLM servers. Note that this proxy also works with only one vLLM server running, but will fall back to direct request forwarding which is meaningless.
First, you need to have at least two vLLM servers running in data parallel. These can be mock servers or actual vLLM servers. Note that this proxy also works with only one vLLM server running, but will fall back to direct request forwarding which is meaningless.
You can start external vLLM dp servers one-by-one manually or using the launch script in `examples/external_online_dp`. For scenarios of large dp size across multi nodes, we recommend using our launch script for convenience.
You can start external vLLM DP servers one-by-one manually or using the launch script in `examples/external_online_dp`. For scenarios of large DP size across multiple nodes, we recommend using our launch script for convenience.
### Manually Launch
@@ -38,7 +38,7 @@ vllm serve --host 0.0.0.0 --port 8101 --data-parallel-size 2 --data-parallel-ran
### Use Launch Script
Firstly, you need to modify the `examples/external_online_dp/run_dp_template.sh` according to your vLLM configuration. Then you can use `examples/external_online_dp/launch_online_dp.py` to launch multiple vLLM instances in one command each node. It will internally call `examples/external_online_dp/run_dp_template.sh` for each DP rank with proper DP-related parameters.
Firstly, you need to modify the `examples/external_online_dp/run_dp_template.sh` according to your vLLM configuration. Then you can use `examples/external_online_dp/launch_online_dp.py` to launch multiple vLLM instances in one command on each node. It will internally call `examples/external_online_dp/run_dp_template.sh` for each DP rank with proper DP-related parameters.
An example of running external DP in one single node:
@@ -65,9 +65,9 @@ python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 2 --dp-rank-s
## Starting Load-balance Proxy Server
After all vLLM DP instances are launched, you can now launch the load-balance proxy server which serves as entrypoint for coming requests and load balance them between vLLM DP instances.
After all vLLM DP instances are launched, you can now launch the load-balance proxy server, which serves as an entrypoint for coming requests and load-balances them between vLLM DP instances.
The proxy server has following features:
The proxy server has the following features:
- Load balances requests to multiple vLLM servers based on request length.
- Supports OpenAI-compatible `/v1/completions` and `/v1/chat/completions` endpoints.
@@ -88,4 +88,4 @@ python dp_load_balance_proxy_server.py \
--dp-ports 9000 9001 \
```
After this, you can directly send requests to the proxy server and run DP with external load-balance.
After this, you can directly send requests to the proxy server and run DP with external load balancing.