xc-llm-ascend

Files

zhenwenqi2024 ddd475d5be [ModelRunner] apply_grammer uses vllm function (#4974 )

### What this PR does / why we need it?
this pr removes apply_gramme in npu_model_runner. we change logits to
cpu, and do the same thing with gpu_model_runner.
it may change the performance, we will change it after torch.compile is
supported with npu inductor

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>

2025-12-16 15:26:01 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[Feature] model_runner refactor (#4764 )

2025-12-12 17:27:09 +08:00

model_runner_v1.py

[ModelRunner] apply_grammer uses vllm function (#4974 )

2025-12-16 15:26:01 +08:00

npu_input_batch.py

[Misc] Upgrade vllm hash to 12_14 (#5000 )

2025-12-15 19:54:23 +08:00

worker_v1.py

vllm-ascend support Ascend950 with Qwen dense model. (#4228 )

2025-12-12 15:50:57 +08:00