optimize the funtion of computing topk and topp in sampler. (#970)
### What this PR does / why we need it? Optimize the performance of calculation logic in sampler and deepseekv2. ### Does this PR introduce _any_ user-facing change? Added VLLM_ENABLE_TOPK_OPTIMZE config in sampler ### How was this patch tested? pytest test_sampler.py Signed-off-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com> Co-authored-by: ZhengWG <zwg0606@gmail.com>
This commit is contained in:
@@ -363,7 +363,7 @@ def fused_experts(
|
||||
num_experts)).to(topk_ids.dtype)
|
||||
|
||||
# Sort by local expert IDs
|
||||
sort_indices = torch.argsort(filtered_experts)
|
||||
sort_indices = torch.argsort(filtered_experts.view(torch.float32))
|
||||
sorted_token_indices = token_indices[sort_indices]
|
||||
sorted_weights = filtered_weights[sort_indices]
|
||||
|
||||
|
||||
Reference in New Issue
Block a user