Files
xc-llm-ascend/vllm_ascend/spec_decode
Zetong Li 06ec136f08 [Bugfix] Obtain kernel block size for computing slot mapping correctly (#7019)
### What this PR does / why we need it?
This PR aims to fix incorrect slot mapping in qwen35 due to mismatched
block size. In qwen35, we should use `kernel_block_size` so that we can
compute it in a correct way, and it is obtained in `load_model` when we
have a chance to grab `draft_attn_layers`.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: Zetong Li <slippersss@126.com>
2026-03-09 11:05:01 +08:00
..