[EPLB] Reduce the memory used for heat aggregation (#6729)
### What this PR does / why we need it?
If dist.all_gather is used directly, 2 x HCCL_BUFFSIZE memory will be
consumed, but the actual memory required for hotspot aggregation is less
than 1 MB. Therefore, a separate small communication domain is created
for it.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Original:

Current:

- vLLM version: v0.15.0
- vLLM main:
9562912cea
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
This commit is contained in:
@@ -62,6 +62,7 @@ _CP_CHUNKEDPREFILL_COMM_STREAM = None
|
||||
_ASCEND_CUSTOMOP_IS_REIGISTERED = False
|
||||
_DEFAULT_BUFFER_SIZE = 200
|
||||
_MIN_DP_BUFFER_SIZE = 50
|
||||
_DYNAMIC_EPLB_BUFFER_SIZE = 1 # num_experts * num_layers * 64 byte
|
||||
_IS_MOE_MODEL = None
|
||||
_IS_DRAFTER_MOE_MODEL = None
|
||||
_IS_VL_MODEL = None
|
||||
@@ -907,6 +908,7 @@ def get_hccl_config_for_pg_options(group_name: str) -> dict | None:
|
||||
return None
|
||||
hccl_config_map = {
|
||||
"dp": {"hccl_buffer_size": calculate_dp_buffer_size()},
|
||||
"dynamic_eplb": {"hccl_buffer_size": _DYNAMIC_EPLB_BUFFER_SIZE},
|
||||
}
|
||||
return hccl_config_map.get(group_name, get_default_buffer_config())
|
||||
|
||||
|
||||
Reference in New Issue
Block a user