xc-llm-ascend

Files

MengLong Chen 07f4710216 [BugFix] Fix dummy_run memory explosion in eager mode (#3132 )

### What this PR does / why we need it?

It is a quick bugfix for the memory explosion issue that requires
further refactoring.
The dummy_run in eager mode may lead to OOM and the reason is that
`hidden_states` were not released in time.
The PR temporarily resolves the issue by manually clearing the cache,
and further refactoring will be conducted subsequently.

Before the modification, the dummy_run's memory showed an accumulation
issue.
<img width="1796" height="207" alt="image"
src="https://github.com/user-attachments/assets/05e2b04c-2f99-4085-9eda-c78b7d9a57b0"
/>

After modification, it can be observed that the memory is released
promptly.
And it was verified that the model responded normally after a single
data input.


- vLLM version: v0.10.2
- vLLM main:
b1068903fd

---------

Signed-off-by: chenmenglong <chenmenglong1@huawei.com>

2025-09-25 16:09:44 +08:00

moe

[BugFix] Fix dummy_run memory explosion in eager mode (#3132 )

2025-09-25 16:09:44 +08:00

__init__.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

activation.py

[main] mlp weight prefetch in Qwen Dense Models (#2816 )

2025-09-11 21:20:09 +08:00

attention.py

Disaggregate prefill for kv cache register style (#950 )