[EPLB][Bugfix]Reduce unnecessary video memory usage (#6020)

### What this PR does / why we need it?
1.Incorporate the warm up of the EPLB into the profile run.
2.Reusing the same gather buffer

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
qwen3-235b aime baseline
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |

eplb The OOM issue does not occur.
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |

- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
This commit is contained in:
LI SHENGYONG
2026-01-23 14:21:13 +08:00
committed by GitHub
parent 749e24f81e
commit 8210a62a44
4 changed files with 20 additions and 30 deletions

View File

@@ -60,7 +60,6 @@ class D2DExpertWeightLoader:
layer_id][global_expert_id_to_send].item()
for src_tensor in self.eplb_adaptor.expert_param_per_layer[
layer_id][local_expert_id]:
src_tensor = src_tensor.clone()
self.comm_op_list.append(
dist.P2POp(dist.isend, src_tensor, dst_rank))