xc-llm-ascend

Files

drslark a6f6e919e6 [main][bugfix] Fixed the problem that eagle3 will crash in FULL_DECODE_ONLY (#7290 )

### What this PR does / why we need it?
Two problems have been solved in this pr.
These problems occur in the `FULL_DECODE_ONLY` mode that `num_tokens`
should be padded to some value in `cudagraph_capture_sizes`.

1. We found the length of `seq_lens_list` in drafter's `attn_metadata`
is 1 shorter than expected. It will raise a kernel exception to make
vllm crash.
e.g., `num_reqs` = 3, `cudagraph_capture_sizes` = [20],
`actual_seq_lengths_q` is padded well to [4, 8, 12, 20]. But
`seq_lens_list` = [5742, 4700, 7996], it is not padded.

3. Though the length of `seq_lens_list` in target's `attn_metadata` is
the same as expected in `FULL_DECODE_ONLY`, some data are corrupted at
the end of the list.
e.g., `num_reqs` = 3, `cudagraph_capture_sizes` = [20],
`actual_seq_lengths_q` is padded well to [4, 8, 12, 20]. But
`seq_lens_list` = [5742, 4700, 7996, 5738], it has corrupted at the end
of the list.

- vLLM version: v0.17.0
- vLLM main:
4034c3d32e

Signed-off-by: drslark <slarksblood@qq.com>

2026-03-16 20:41:36 +08:00

__init__.py

[feat][spec decode]Unified draft parallel (#6766 )

2026-03-13 14:07:35 +08:00

draft_proposer.py

[feat][spec decode]Unified draft parallel (#6766 )

2026-03-13 14:07:35 +08:00

eagle_proposer.py

[main][bugfix] Fixed the problem that eagle3 will crash in FULL_DECODE_ONLY (#7290 )

2026-03-16 20:41:36 +08:00

medusa_proposer.py

[Spec Decode]clean up spec decode interface (#6947 )