xc-llm-ascend

Files

Mengqing Cao e7aa2c285c [SpecDecode] Fix Draft model proposer (#7230 )

### What this PR does / why we need it?
This pr fix the Unified draft parallel feature. 
1. In Draft model proposer, there are exceed 1 attention layers in
target model, thus removing the assertion on layer number.
2. we should get block size through `draft_attn_groups` instead of
`attn_metadata_builder` after 0.17.0.
3. `attn_update_stack_num_spec_norm` shouldn't be done when unified
draft parallel is enabled

### How was this patch tested?
Test pass with
`tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_parallel_drafting_acceptance`,
which is already included in CI

- vLLM version: v0.17.0
- vLLM main:
4034c3d32e

Signed-off-by: MengqingCao <cmq0113@163.com>

2026-03-14 18:26:37 +08:00

__init__.py

[feat][spec decode]Unified draft parallel (#6766 )

2026-03-13 14:07:35 +08:00

draft_proposer.py

[feat][spec decode]Unified draft parallel (#6766 )

2026-03-13 14:07:35 +08:00

eagle_proposer.py

[SpecDecode] Fix Draft model proposer (#7230 )

2026-03-14 18:26:37 +08:00

medusa_proposer.py

[Spec Decode]clean up spec decode interface (#6947 )