Files
xc-llm-ascend/vllm_ascend/sample
Yizhou 638dbcdb32 [Perf] Remove D2H operations to imporve performance (#4063)
### What this PR does / why we need it?
Replace masked in-place assignment with a device-side torch.where so
selection stays on-device, allowing subsequent device ops to be enqueued
earlier and removing an implicit D2H sync, reducing latency by several
hundreds μs on Ascend.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.
- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-11-12 09:08:55 +08:00
..
2025-10-09 10:28:38 +08:00