Files
xc-llm-ascend/vllm_ascend
Wang Kunpeng 13c4a9c78b [bugfix]Fix accuracy issue in PCP/DCP with speculative decoding (#6491)
### What this PR does / why we need it?

This PR fixes an accuracy issue that occurs when using Prefill/Decode
Context Parallelism (PCP/DCP) in conjunction with speculative decoding
(MTP). The issue is caused by an irregular attention mask shape when
both features are enabled.

The fix involves flattening the `block_table` for speculative decoding
requests under PCP/DCP to ensure a regular attention mask. This PR also
introduces a `use_cp` property for cleaner code and updates dummy runs
to handle this scenario correctly.

### Does this PR introduce _any_ user-facing change?

No. This is a bug fix that improves accuracy and should not have
user-facing API changes.

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: Wang Kunpeng <1289706727@qq.com>
2026-02-05 10:06:14 +08:00
..
2026-01-22 15:46:59 +08:00
2026-02-03 14:13:06 +08:00