xc-llm-ascend

Files

Zetong Li 06ec136f08 [Bugfix] Obtain kernel block size for computing slot mapping correctly (#7019 )

### What this PR does / why we need it?
This PR aims to fix incorrect slot mapping in qwen35 due to mismatched
block size. In qwen35, we should use `kernel_block_size` so that we can
compute it in a correct way, and it is obtained in `load_model` when we
have a chance to grab `draft_attn_layers`.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: Zetong Li <slippersss@126.com>

2026-03-09 11:05:01 +08:00

__init__.py

[Refactor][EAGLE] 8/N delete mtp_proposer (re-pull) (#7033 )

2026-03-06 17:11:22 +08:00

eagle_proposer.py

[Bugfix] Obtain kernel block size for computing slot mapping correctly (#7019 )

2026-03-09 11:05:01 +08:00

medusa_proposer.py

[Spec Decode]clean up spec decode interface (#6947 )

2026-03-05 14:30:10 +08:00

ngram_proposer.py

[Spec Decode]clean up spec decode interface (#6947 )

2026-03-05 14:30:10 +08:00

suffix_proposer.py

[Spec Decode]clean up spec decode interface (#6947 )

2026-03-05 14:30:10 +08:00