xc-llm-ascend

Files

Wang Kunpeng c498cea22d [refactor] refactor excute_model and _dymmy_run method (#6043 )

### What this PR does / why we need it?
The structure of the `excute_model` and `_dymmy_run` methods in
NPUModelRunner differs greatly from that in GPUModelRunner.
Achieve alignment with GPUModelRunner:
Split the `_prepare_inputs` method into `_prepare_inputs`,
`_determine_batch_execution_and_padding`, `_build_attention_metadata`,
and `_preprocess`.
Modify `_generate_process_reqs_hidden_states` to `_model_forward`.
Align the implementation of the `postprocess` phase

**Related-RFC**: https://github.com/vllm-project/vllm-ascend/issues/5449

**Co-authored-by**: @zhenwenqi2024 
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: Wang Kunpeng <1289706727@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>

2026-01-27 22:27:01 +08:00

attention

[UT]: refactoring 310p ops ut (#6296 )

2026-01-27 16:31:51 +08:00

compilation

Reapply "[Refactor] Unify full-graph parameter update logic (#6041 )" (#6227 ) (#6231 )

2026-01-26 09:04:54 +08:00

core

[MM][Bugfix] Update hf_config to hf_text_config (#5319 )

2026-01-06 16:41:39 +08:00

device_allocator

[Refactor] Modify the binding logic to allocate CPU cores for each NPU card (#5555 )