xc-llm-ascend

Files

Wang Kunpeng c498cea22d [refactor] refactor excute_model and _dymmy_run method (#6043 )

### What this PR does / why we need it?
The structure of the `excute_model` and `_dymmy_run` methods in
NPUModelRunner differs greatly from that in GPUModelRunner.
Achieve alignment with GPUModelRunner:
Split the `_prepare_inputs` method into `_prepare_inputs`,
`_determine_batch_execution_and_padding`, `_build_attention_metadata`,
and `_preprocess`.
Modify `_generate_process_reqs_hidden_states` to `_model_forward`.
Align the implementation of the `postprocess` phase

**Related-RFC**: https://github.com/vllm-project/vllm-ascend/issues/5449

**Co-authored-by**: @zhenwenqi2024 
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: Wang Kunpeng <1289706727@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>

2026-01-27 22:27:01 +08:00

test_block_table.py

[UT][PCP&DCP] UT for block_table.py (#5032 )

2026-01-06 11:19:25 +08:00

test_pcp_manager.py

[refactor] refactor excute_model and _dymmy_run method (#6043 )

2026-01-27 22:27:01 +08:00

test_worker_v1.py

Drop vLLM 0.13.0 support (#6069 )

2026-01-23 09:45:08 +08:00