[EPLB]Record expert map without dynamic eplb. (#3409)

What this PR does / why we need it?
1.Record expert map without dynamic eplb.
2.Add export PYTHONOPTIMIZE=1  when using dynamic eplb.
3.change eplb doc

Does this PR introduce any user-facing change?
How was this patch tested?
Qwen3_moe in A3.

- vLLM version: v0.11.0

---------

Signed-off-by: offline0806 <3337230449@qq.com>
Co-authored-by: offline0806 <3337230449@qq.com>
This commit is contained in:
offline893
2025-10-15 14:21:15 +08:00
committed by GitHub
parent 4f937f561d
commit 5a3082cd15
9 changed files with 49 additions and 15 deletions

View File

@@ -16,7 +16,7 @@ Expert balancing for MoE models in LLM serving is essential for optimal performa
### Dynamic EPLB
Enable dynamic balancing with auto-tuned parameters. Adjust num_iterations_eplb_update and num_wait_worker_iterations based on workload patterns.
We need to add environment variable `export PYTHONOPTIMIZE=1` to get context of vllm process. Enable dynamic balancing with auto-tuned parameters. Adjust num_iterations_eplb_update and num_wait_worker_iterations based on workload patterns.
```shell
vllm serve Qwen/Qwen3-235B-A22 \
@@ -25,7 +25,6 @@ vllm serve Qwen/Qwen3-235B-A22 \
--additional-config '{
"dynamic_eplb": true,
"num_iterations_eplb_update": 400,
"gate_eplb": true,
"num_wait_worker_iterations": 30
}'
```
@@ -42,9 +41,7 @@ vllm serve Qwen/Qwen3-235B-A22 \
--additional-config '{
"expert_map_record_path": "/path/to/eplb.json",
"init_redundancy_expert": 16,
"dynamic_eplb": true,
"num_iterations_eplb_update": 400,
"gate_eplb": true,
"num_wait_worker_iterations": 30
}'
```