[Patch]patch of v1 executor when enable eplb. (#3511)

### What this PR does / why we need it?
when using dynamic eplb, patch v1 executor to avoid create child process
failed.

### How was this patch tested?
deepseek in v3.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: offline0806 <3337230449@qq.com>
Co-authored-by: offline0806 <3337230449@qq.com>
This commit is contained in:
offline893
2025-10-19 10:54:26 +08:00
committed by GitHub
parent 646c1db5d7
commit 6c9909c861
5 changed files with 192 additions and 10 deletions

View File

@@ -16,7 +16,7 @@ Expert balancing for MoE models in LLM serving is essential for optimal performa
### Dynamic EPLB
We need to add environment variable `export PYTHONOPTIMIZE=1` to get context of vllm process. Enable dynamic balancing with auto-tuned parameters. Adjust num_iterations_eplb_update and num_wait_worker_iterations based on workload patterns.
We need to add environment variable `export DYNAMIC_EPLB=true` to enable vllm eplb. Enable dynamic balancing with auto-tuned parameters. Adjust num_iterations_eplb_update and num_wait_worker_iterations based on workload patterns.
```shell
vllm serve Qwen/Qwen3-235B-A22 \
@@ -32,7 +32,7 @@ vllm serve Qwen/Qwen3-235B-A22 \
### Static EPLB
#### Initial Setup (Record Expert Map)
Generate the initial expert distribution map using expert_map_record_path. This creates a baseline configuration for future deployments.
We need to add environment variable `export EXPERT_MAP_RECORD=true` to record expert map.Generate the initial expert distribution map using expert_map_record_path. This creates a baseline configuration for future deployments.
```shell
vllm serve Qwen/Qwen3-235B-A22 \