xc-llm-ascend

Files

yuxingcyx 5a88e3333b feat: implement high-performance Triton kernels for rejection sampling (#4830 )

### What this PR does / why we need it?
This PR introduces optimized Triton implementations for the
rejection_greedy_sample_kernel and expand_kernel, delivering superior
performance compared to the existing Triton implementations. The new
Triton kernels maintain full functional accuracy while delivering
significant performance improvements across various batch sizes and MTP
configurations.

### Does this PR introduce _any_ user-facing change?
Yes, this PR modifies rejection_sampler.py to use optimized Triton
kernels:

- rejection_greedy_sample_kernel is enhanced with
rejection_greedy_sample_spec_len_1_triton and
rejection_greedy_sample_triton implementations

- expand_kernel receives a performance-optimized Triton version

These changes provide substantial performance improvements while
maintaining backward compatibility


- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: yuxingcyx <yuxingchen.math@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

2025-12-18 19:42:10 +08:00

logits_processor

[performance] Enhance performance after enabling min_p (#4529 )

2025-12-02 20:35:51 +08:00

__init__.py

Spec decode support for V1 Engine (#874 )

2025-05-23 14:25:46 +08:00

rejection_sampler.py

feat: implement high-performance Triton kernels for rejection sampling (#4830 )

2025-12-18 19:42:10 +08:00

sampler.py

[Performance] Pre-issued exponential distribution operator. (#4908 )

2025-12-11 23:02:51 +08:00