[main][Docs] Fix typos across documentation (#6728)

## Summary

Fix typos and improve grammar consistency across 50 documentation files.
 
### Changes include:
- Spelling corrections (e.g., "Facotory" → "Factory", "certainty" →
"determinism")
- Grammar improvements (e.g., "multi-thread" → "multi-threaded",
"re-routed" → "re-run")
- Punctuation fixes (semicolon consistency in filter parameters)
- Code style fixes (correct flag name `--num-prompts` instead of
`--num-prompt`)
- Capitalization consistency (e.g., "python" → "Python", "ascend" →
"Ascend")
- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This commit is contained in:
Cao Yi
2026-02-13 15:50:05 +08:00
committed by GitHub
parent b6bc3d2f9d
commit 6de207de88
50 changed files with 273 additions and 272 deletions

View File

@@ -16,17 +16,17 @@ Expert balancing for MoE models in LLM serving is essential for optimal performa
### Models
DeepseekV3/V3.1/R1Qwen3-MOE
DeepSeekV3/V3.1/R1, Qwen3-MoE
### MOE QuantType
W8A8-dynamic
W8A8-Dynamic
## How to Use EPLB
### Dynamic EPLB
We need to add environment variable `export DYNAMIC_EPLB="true"` to enable vllm eplb. Enable dynamic balancing with auto-tuned parameters. Adjust expert_heat_collection_interval and algorithm_execution_interval based on workload patterns.
We need to add environment variable `export DYNAMIC_EPLB="true"` to enable vLLM EPLB. Enable dynamic balancing with auto-tuned parameters. Adjust expert_heat_collection_interval and algorithm_execution_interval based on workload patterns.
```shell
vllm serve Qwen/Qwen3-235B-A22 \
@@ -87,7 +87,7 @@ vllm serve Qwen/Qwen3-235B-A22 \
4. Monitoring & Validation:
- Track metrics: expert_load_balance_ratio, ttft_p99, tpot_avg, and gpu_utilization.
- Use vllm monitor to detect imbalances during runtime.
- Use vLLM monitor to detect imbalances during runtime.
- Always verify expert map JSON structure before loading (validate with jq or similar tools).
5. Startup Behavior: