xc-llm-ascend

Files

zhenghaojiang 0f7492d18e [Bugfix] fix the oom when chunkprefill with long context like 64k (#2319 )

The attn mask was declared in the mla.py，we don't need the splitfuse
mask when mla chunkprefill, and this mask will cause memory problem when
long context like 64k or 128k

- vLLM version: v0.10.0
- vLLM main:
14a5d903ab

---------

Signed-off-by: haojiangzheng <justineric096@gmail.com>

2025-08-13 17:15:59 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

eagle_proposer_v1.py

[Misc] Fix logger bug (#2024 )

2025-07-28 15:59:09 +08:00

model_runner_v1.py

[Bugfix] fix the oom when chunkprefill with long context like 64k (#2319 )

2025-08-13 17:15:59 +08:00

mtp_proposer_v1.py

[V1] MTP supports torchair (#2145 )