[EPLB][Bugfix]Reduce unnecessary video memory usage (#6020)

### What this PR does / why we need it? 1.Incorporate the warm up of the EPLB into the profile run. 2.Reusing the same gather buffer ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? qwen3-235b aime baseline | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 86.67 | eplb The OOM issue does not occur. | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 86.67 | - vLLM version: v0.13.0 - vLLM main: 2c24bc6996 Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
2026-01-23 14:21:13 +08:00
parent 749e24f81e
commit 8210a62a44
4 changed files with 20 additions and 30 deletions
--- a/vllm_ascend/eplb/core/eplb_device_transfer_loader.py
+++ b/vllm_ascend/eplb/core/eplb_device_transfer_loader.py
@@ -60,7 +60,6 @@ class D2DExpertWeightLoader:
                layer_id][global_expert_id_to_send].item()
            for src_tensor in self.eplb_adaptor.expert_param_per_layer[
                    layer_id][local_expert_id]:
-                src_tensor = src_tensor.clone()
                self.comm_op_list.append(
                    dist.P2POp(dist.isend, src_tensor, dst_rank))