### What this PR does / why we need it?
If dist.all_gather is used directly, 2 x HCCL_BUFFSIZE memory will be
consumed, but the actual memory required for hotspot aggregation is less
than 1 MB. Therefore, a separate small communication domain is created
for it.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Original:

Current:

- vLLM version: v0.15.0
- vLLM main:
9562912cea
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>