xc-llm-ascend

Files

Yizhou 755caeb06e [Feat][Spec] Optimize token index calculation in spec decode with Triton kernel (#5356 )

### What this PR does / why we need it?
Replace multiple PyTorch operations with a fused Triton kernel to
determine token indices for sampling during speculative decoding. This
reduces kernel launch overhead and memory traffic, improving overall
performance on Ascend hardware.

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

2026-01-05 16:51:29 +08:00

multi_node

[Bugfix] fix pcp + eplb error (#5561 )

2026-01-05 14:08:11 +08:00

ops/triton

feat: implement high-performance Triton kernels for rejection sampling: optimization for rejection_random_sample_kernel (#5259 )

2026-01-05 16:03:02 +08:00

single_node

[Feat][Spec] Optimize token index calculation in spec decode with Triton kernel (#5356 )

2026-01-05 16:51:29 +08:00