Files
xc-llm-ascend/vllm_ascend
Ronald 916a9a1913 fix synchronize error of exceeds_max_model_len d2h copy (#4708)
### What this PR does / why we need it?
there is d2h copy blocking cpu operations in mtp propose method, which
make host bound issue. this pr refactor it and use cpu tensor to
implement it.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
vllm main f5d3d93c40417c296c20dc301100e55708a17f3f

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-08 09:07:59 +08:00
..
2025-12-05 09:03:45 +08:00
2025-12-02 22:10:52 +08:00
2025-11-24 17:08:20 +08:00
2025-12-05 09:03:45 +08:00
2025-12-02 17:35:47 +08:00