xc-llm-ascend

Files

Aoxuan Chen 8763953f56 [Feature] add the magicmtp speculative decoding acceleration algorithm (#5542 )

### What this PR does / why we need it?

1. MagicMTP (paper: "Block Verification Accelerates Speculative
Decoding") was introduced to consider the influence among multiple draft
tokens, improving the acceptance rate without compromising accuracy.
2. Added Triton and PyTorch implementations, and added E2E test cases.

### Does this PR introduce _any_ user-facing change?
MagicMTP will automatically take effect when the parameter
"num_speculative_tokens" >= 3.
- vLLM version: v0.13.0
- vLLM main:
7157596103

Signed-off-by: chenaoxuan <cax1165@163.com>

2026-01-08 09:15:55 +08:00

activation

[bugfix] fix test_camem failed with triton-ascend (#5492 )

2026-01-05 20:10:54 +08:00

batch_invariant

[Feature] implement basic framework for batch invariant (#5517 )

2026-01-07 09:11:26 +08:00

fla

[bugfix] fix test_camem failed with triton-ascend (#5492 )

2026-01-05 20:10:54 +08:00

linearnorm

[TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (#5267 )