[Perf] add patch to optimize apply_topk_topp (#1732)

### What this PR does / why we need it?
Performance optimization for apply_top_k_top_p
### Does this PR introduce _any_ user-facing change?
Use VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION to enable this feature
### How was this patch tested?
e2e & ut

















- vLLM version: v0.9.2
- vLLM main:
6a9e6b2abf

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
This commit is contained in:
Pr0Wh1teGivee
2025-07-11 15:32:02 +08:00
committed by GitHub
parent aa4240c67f
commit d13fb0766e
8 changed files with 304 additions and 0 deletions

View File

@@ -128,6 +128,11 @@ env_variables: Dict[str, Callable[[], Any]] = {
"VLLM_ASCEND_KV_CACHE_MEGABYTES_FLOATING_TOLERANCE":
lambda: int(
os.getenv("VLLM_ASCEND_KV_CACHE_MEGABYTES_FLOATING_TOLERANCE", 64)),
# Whether to enable the topk optimization. It's disabled by default for experimental support
# We'll make it enabled by default in the future.
"VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION":
lambda: bool(
int(os.getenv("VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION", '0'))),
}
# end-env-vars-definition