optimize the funtion of computing topk and topp in sampler. (#970)

### What this PR does / why we need it?
Optimize the performance of calculation logic in sampler and deepseekv2.

### Does this PR introduce _any_ user-facing change?
Added VLLM_ENABLE_TOPK_OPTIMZE config in sampler

### How was this patch tested?
pytest test_sampler.py

Signed-off-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com>
Co-authored-by: ZhengWG <zwg0606@gmail.com>
This commit is contained in:
sherie
2025-06-05 16:42:18 +08:00
committed by GitHub
parent e1ab6d318e
commit 908a851a77
9 changed files with 330 additions and 3 deletions

View File

@@ -36,6 +36,8 @@ env_variables: Dict[str, Callable[[], Any]] = {
lambda: bool(int(os.getenv("COMPILE_CUSTOM_KERNELS", "1"))),
"VLLM_ENABLE_MC2":
lambda: bool(int(os.getenv("VLLM_ENABLE_MC2", '0'))),
"VLLM_ASCEND_ENABLE_TOPK_OPTIMZE":
lambda: bool(int(os.getenv("VLLM_ASCEND_ENABLE_TOPK_OPTIMZE", '0'))),
"USING_LCCL_COM":
lambda: bool(int(os.getenv("USING_LCCL_COM", '0'))),
"SOC_VERSION":