daniel
8ffe3f5d78
feat: implement high-performance Triton kernels for rejection sampling: optimization for rejection_random_sample_kernel (#5259)
### What this PR does / why we need it?
This PR introduces optimized Triton implementations for the
rejection_random_sample_kernel delivering superior performance compared
to the existing Triton implementations. The new Triton kernels maintain
full functional accuracy while delivering significant performance
improvements across various batch sizes and MTP configurations.
### Does this PR introduce _any_ user-facing change?
Yes, this PR modifies rejection_sampler.py to use optimized Triton
kernels:
rejection_random_sample_kernel is modified and optimized
### How was this patch tested?
performance benchmark results:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta name=Generator content="Microsoft Excel">
<!--[if !mso]>
</head>
<body>
<!--StartFragment-->
Batch Size | MTP | origin implementation(us) | optimized version(us)
-- | -- | -- | --
1 | 1 | 2.934 | 3.64
8 | 1 | 4.467 | 4
32 | 1 | 6.98 | 4.54
64 | 1 | 11.087 | 6.42
128 | 1 | 13.414 | 7.84
256 | 1 | 19.66 | 8.487
512 | 1 | 39.908 | 11.62
1024 | 1 | 81.781 | 18.16
2048 | 1 | 137.923 | 32.934
1 | 2 | 3.4 | 4.02
8 | 2 | 3.74 | 4.24
32 | 2 | 6.373 | 7.394
64 | 2 | 9.747 | 6.46
128 | 2 | 12.98 | 7.76
256 | 2 | 20.834 | 9.787
512 | 2 | 39.314 | 13.56
1024 | 2 | 83.135 | 22.387
2048 | 2 | 157.563 | 40.607
<!--EndFragment-->
</body>
</html>
- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c
Signed-off-by: 1024daniel <xxltju324@gmail.com>
2026-01-05 16:03:02 +08:00
..
2025-11-28 18:06:39 +08:00
2026-01-05 11:41:59 +08:00
2025-12-31 17:10:26 +08:00
2025-12-20 17:03:25 +08:00
2025-07-28 16:01:59 +08:00
2026-01-04 22:22:01 +08:00
2025-12-31 09:19:04 +08:00
2025-12-27 10:21:20 +08:00
2025-12-19 14:22:06 +08:00
2025-12-11 18:45:43 +08:00
2026-01-05 16:03:02 +08:00
2026-01-05 14:08:11 +08:00
2025-12-31 17:06:55 +08:00
2026-01-05 16:03:02 +08:00
2026-01-05 14:07:54 +08:00
2026-01-05 14:07:54 +08:00
2025-12-25 09:17:06 +08:00
2025-12-02 17:35:47 +08:00
2025-12-31 14:24:04 +08:00
2026-01-04 17:51:28 +08:00
2025-10-21 09:17:03 +08:00
2025-12-25 11:09:56 +08:00
2025-12-14 09:34:13 +08:00
2025-09-13 11:58:52 +08:00
2026-01-05 14:08:11 +08:00
2025-12-05 09:03:45 +08:00
2026-01-04 17:51:28 +08:00