xc-llm-ascend

Files

XiaoxinWang cbf46fad3c fixed graph mode bug. (#7460 )

### What this PR does / why we need it?
In fulldecodeonly mode, num_req_padded was set to an incorrect value,
causing accuracy degradation in Qwen3-Next. Therefore, we added a check
for compilation_config.cudagraph_mode to the conditional logic, ensuring
that padding is applied only in FULL mode.


### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.17.0
- vLLM main:
8a680463fa

Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>

2026-03-22 10:09:37 +08:00

[model_runner_v2]optimize the performance of the _topk_log_softmax_kernel (#7221 )

2026-03-16 16:49:10 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[Hybrid] support prefix cache for Qwen3.5/Next with --mamba-cache-mode align (#7103 )

2026-03-15 09:44:09 +08:00

model_runner_v1.py

fixed graph mode bug. (#7460 )

2026-03-22 10:09:37 +08:00