### What this PR does / why we need it?
It is a quick bugfix for the memory explosion issue that requires
further refactoring.
The dummy_run in eager mode may lead to OOM and the reason is that
`hidden_states` were not released in time.
The PR temporarily resolves the issue by manually clearing the cache,
and further refactoring will be conducted subsequently.
Before the modification, the dummy_run's memory showed an accumulation
issue.
<img width="1796" height="207" alt="image"
src="https://github.com/user-attachments/assets/05e2b04c-2f99-4085-9eda-c78b7d9a57b0"
/>
After modification, it can be observed that the memory is released
promptly.
And it was verified that the model responded normally after a single
data input.
- vLLM version: v0.10.2
- vLLM main:
b1068903fd
---------
Signed-off-by: chenmenglong <chenmenglong1@huawei.com>