[EPLB]Record expert map without dynamic eplb. (#3409)
What this PR does / why we need it? 1.Record expert map without dynamic eplb. 2.Add export PYTHONOPTIMIZE=1 when using dynamic eplb. 3.change eplb doc Does this PR introduce any user-facing change? How was this patch tested? Qwen3_moe in A3. - vLLM version: v0.11.0 --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>
This commit is contained in:
@@ -16,7 +16,7 @@ Expert balancing for MoE models in LLM serving is essential for optimal performa
|
||||
|
||||
### Dynamic EPLB
|
||||
|
||||
Enable dynamic balancing with auto-tuned parameters. Adjust num_iterations_eplb_update and num_wait_worker_iterations based on workload patterns.
|
||||
We need to add environment variable `export PYTHONOPTIMIZE=1` to get context of vllm process. Enable dynamic balancing with auto-tuned parameters. Adjust num_iterations_eplb_update and num_wait_worker_iterations based on workload patterns.
|
||||
|
||||
```shell
|
||||
vllm serve Qwen/Qwen3-235B-A22 \
|
||||
@@ -25,7 +25,6 @@ vllm serve Qwen/Qwen3-235B-A22 \
|
||||
--additional-config '{
|
||||
"dynamic_eplb": true,
|
||||
"num_iterations_eplb_update": 400,
|
||||
"gate_eplb": true,
|
||||
"num_wait_worker_iterations": 30
|
||||
}'
|
||||
```
|
||||
@@ -42,9 +41,7 @@ vllm serve Qwen/Qwen3-235B-A22 \
|
||||
--additional-config '{
|
||||
"expert_map_record_path": "/path/to/eplb.json",
|
||||
"init_redundancy_expert": 16,
|
||||
"dynamic_eplb": true,
|
||||
"num_iterations_eplb_update": 400,
|
||||
"gate_eplb": true,
|
||||
"num_wait_worker_iterations": 30
|
||||
}'
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user