[EPLB] Reduce the memory used for heat aggregation (#6729)

### What this PR does / why we need it?
If dist.all_gather is used directly, 2 x HCCL_BUFFSIZE memory will be
consumed, but the actual memory required for hotspot aggregation is less
than 1 MB. Therefore, a separate small communication domain is created
for it.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
Original:

![1](https://github.com/user-attachments/assets/8880b461-c26f-497c-9a05-2ca60cc46aa4)
Current:

![2](https://github.com/user-attachments/assets/c9da32b5-9200-4fa2-aff9-d8c4978ac602)


- vLLM version: v0.15.0
- vLLM main:
9562912cea

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
This commit is contained in:
LI SHENGYONG
2026-02-24 18:02:24 +08:00
committed by GitHub
parent 5c8ab7af39
commit 0331f16a50
3 changed files with 25 additions and 7 deletions

View File

@@ -62,6 +62,7 @@ _CP_CHUNKEDPREFILL_COMM_STREAM = None
_ASCEND_CUSTOMOP_IS_REIGISTERED = False
_DEFAULT_BUFFER_SIZE = 200
_MIN_DP_BUFFER_SIZE = 50
_DYNAMIC_EPLB_BUFFER_SIZE = 1 # num_experts * num_layers * 64 byte
_IS_MOE_MODEL = None
_IS_DRAFTER_MOE_MODEL = None
_IS_VL_MODEL = None
@@ -907,6 +908,7 @@ def get_hccl_config_for_pg_options(group_name: str) -> dict | None:
return None
hccl_config_map = {
"dp": {"hccl_buffer_size": calculate_dp_buffer_size()},
"dynamic_eplb": {"hccl_buffer_size": _DYNAMIC_EPLB_BUFFER_SIZE},
}
return hccl_config_map.get(group_name, get_default_buffer_config())