[Perf] add patch to optimize apply_topk_topp (#1732)

### What this PR does / why we need it? Performance optimization for apply_top_k_top_p ### Does this PR introduce _any_ user-facing change? Use VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION to enable this feature ### How was this patch tested? e2e & ut - vLLM version: v0.9.2 - vLLM main: 6a9e6b2abf Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
2025-07-11 15:32:02 +08:00
parent aa4240c67f
commit d13fb0766e
8 changed files with 304 additions and 0 deletions
--- a/vllm_ascend/patch/init.py
+++ b/vllm_ascend/patch/init.py
@@ -129,3 +129,17 @@
 #       This is the problem in vllm-ascend
 #    Future Plan:
 #       Remove this patch once pytorch 2.7.0 is supported for vllm ascend.
+#
+# ** File: worker/patch_common/patch_sampler.py **
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#   1. `vllm.v1.sample.sampler.Sampler.apply_top_k_top_p`
+#    Why:
+#       We need to use the patched `apply_top_k_top_p` in `sample`.
+#       The mainly reason to overwrite `apply_top_k_top_p` is
+#       to improve performance.
+#    How：
+#       Re-implementation the `apply_top_k_top_p` function by pytorch
+#    Related PR (if no, explain why):
+#       - https://github.com/vllm-project/vllm-ascend/pull/1732
+#    Future Plan:
+#       Revert it when the ascend scatter performance improves.