Files
xc-llm-ascend/vllm_ascend/ops
MengLong Chen 07f4710216 [BugFix] Fix dummy_run memory explosion in eager mode (#3132)
### What this PR does / why we need it?

It is a quick bugfix for the memory explosion issue that requires
further refactoring.
The dummy_run in eager mode may lead to OOM and the reason is that
`hidden_states` were not released in time.
The PR temporarily resolves the issue by manually clearing the cache,
and further refactoring will be conducted subsequently.

Before the modification, the dummy_run's memory showed an accumulation
issue.
<img width="1796" height="207" alt="image"
src="https://github.com/user-attachments/assets/05e2b04c-2f99-4085-9eda-c78b7d9a57b0"
/>

After modification, it can be observed that the memory is released
promptly.
And it was verified that the model responded normally after a single
data input.


- vLLM version: v0.10.2
- vLLM main:
b1068903fd

---------

Signed-off-by: chenmenglong <chenmenglong1@huawei.com>
2025-09-25 16:09:44 +08:00
..
2025-09-18 14:09:19 +08:00