xc-llm-ascend

Files

LI SHENGYONG 8210a62a44 [EPLB][Bugfix]Reduce unnecessary video memory usage (#6020 )

### What this PR does / why we need it?
1.Incorporate the warm up of the EPLB into the profile run.
2.Reusing the same gather buffer

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
qwen3-235b aime baseline
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |

eplb The OOM issue does not occur.
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |

- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

2026-01-23 14:21:13 +08:00

adaptor

[Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (#5755 )

2026-01-19 16:10:43 +08:00

core

[EPLB][Bugfix]Reduce unnecessary video memory usage (#6020 )

2026-01-23 14:21:13 +08:00

__init__.py

Dynamic Expert Load Balance with Zero-like-overhead (#2956 )

2025-09-17 10:36:43 +08:00

eplb_updator.py

[EPLB][Bugfix]Reduce unnecessary video memory usage (#6020 )

2026-01-23 14:21:13 +08:00

utils.py

[Bugfix] Revert pr4214 multi-stream collect expert hotpot (#5529 )

2026-01-07 11:26:47 +08:00