xc-llm-ascend

Files

zhenwenqi2024 f708d919f8 [Feature] model_runner refactor (#4764 )

### What this PR does / why we need it?
refactor npu_modelrunner， we should be close to gpu_modelrunner 

### Does this PR introduce _any_ user-facing change?
NO

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Signed-off-by: zhenwenqi2024 <155598497+zhenwenqi2024@users.noreply.github.com>

2025-12-12 17:27:09 +08:00

__init__.py

Drop torchair (#4814 )

2025-12-10 09:20:40 +08:00

eagle_proposer.py

[Refactor] 2/N Unify all mask generation methods and cache mask (#4779 )

2025-12-09 18:51:00 +08:00

interface.py

upgrade vLLM to main (#4608 )

2025-12-02 22:10:52 +08:00

mtp_proposer.py

[Feature] model_runner refactor (#4764 )