Files
xc-llm-ascend/vllm_ascend/worker
zhenwenqi2024 ddd475d5be [ModelRunner] apply_grammer uses vllm function (#4974)
### What this PR does / why we need it?
this pr removes apply_gramme in npu_model_runner. we change logits to
cpu, and do the same thing with gpu_model_runner.
it may change the performance, we will change it after torch.compile is
supported with npu inductor

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
2025-12-16 15:26:01 +08:00
..