[EPLB] Reduce the memory used for batch_isend_irecv (#7344)

### What this PR does / why we need it?

#6729 seems to reduce the NPU memory usage of eplb, but actually moves
the buffer allocation of dist.all_gather_into_tensor to
dist.batch_isend_irecv. Therefore, the overall NPU memory usage is not
reduced. This PR completely reduces the memory usage in this part.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
Remaining memory of each rank before the repair.
<img width="649" height="99" alt="image"
src="https://github.com/user-attachments/assets/52a67592-e0e8-4f9a-b194-b84cb848c598"
/>

Remaining memory of each rank after the repair.
<img width="641" height="99" alt="image"
src="https://github.com/user-attachments/assets/0bc2e67c-f328-4dea-98af-d7a459fb4876"
/>

Close EPLB.
<img width="543" height="45" alt="image"
src="https://github.com/user-attachments/assets/6dcba19d-4401-44b8-a6d3-c7b35ee983c7"
/>

Memory of weights for each rank.
<img width="648" height="46" alt="image"
src="https://github.com/user-attachments/assets/4db2fd04-98a0-4d26-a026-2e8287102b99"
/>

Estimated memory for EPLB: 15.68  / 48 (layer_num) + 2 * 0.02 = 0.35 GB


- vLLM version: v0.17.0
- vLLM main:
4034c3d32e

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

This commit is contained in:

LI SHENGYONG

2026-03-20 12:25:58 +08:00

committed by

GitHub

parent a1f321a556

commit 1e05c4908f

5 changed files with 21 additions and 10 deletions

									
										2

vllm_ascend/utils.py
									
												View File
												
				@@ -62,7 +62,7 @@ _CP_CHUNKEDPREFILL_COMM_STREAM = None

				_ASCEND_CUSTOMOP_IS_REIGISTERED = False

				_DEFAULT_BUFFER_SIZE = 200

				_MIN_DP_BUFFER_SIZE = 50

				_DYNAMIC_EPLB_BUFFER_SIZE = 1  # num_experts * num_layers * 64 byte

				_DYNAMIC_EPLB_BUFFER_SIZE = 100

				_IS_MOE_MODEL = None

				_IS_DRAFTER_MOE_MODEL = None

				_IS_VL_MODEL = None

[EPLB] Reduce the memory used for batch_isend_irecv (#7344)

2 vllm_ascend/utils.py Unescape Escape View File

2

vllm_ascend/utils.py

View File