[0.11.0][ops] npu_top_k_top_p supports k and p only (#4153)
### What this PR does / why we need it? With CANN 8.3 and corresponding PTA 2.7.1, `npu_top_k_top_p` supports passing only k (1<=k<=1024) and p separately. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? E2E performance test with only `top_k` and `p` seperately. This pr gains 0.2ms improvements in TPOT with `batch_size=16`. Signed-off-by: linfeng-yuan <1102311262@qq.com>
This commit is contained in:
@@ -24,14 +24,14 @@ class AscendTopKTopPSampler(TopKTopPSampler):
|
|||||||
k: torch.Tensor,
|
k: torch.Tensor,
|
||||||
p: torch.Tensor,
|
p: torch.Tensor,
|
||||||
) -> torch.Tensor:
|
) -> torch.Tensor:
|
||||||
# npu_top_k_top_p uses the operator aclnnApplyTopKTopP, but aclnnApplyTopKTopP currently does not support 310P
|
|
||||||
if not is_310p() and p is not None and k is not None and 1 <= int(
|
|
||||||
k.max()) <= 1024:
|
|
||||||
# npu_top_k_top_p's parameter order is (logits, p, k), not (logits, k, p)
|
|
||||||
return torch_npu.npu_top_k_top_p(logits, p, k)
|
|
||||||
|
|
||||||
if p is None and k is None:
|
if p is None and k is None:
|
||||||
return logits
|
return logits
|
||||||
|
# npu_top_k_top_p uses the operator aclnnApplyTopKTopP, but aclnnApplyTopKTopP currently does not support 310P
|
||||||
|
if not is_310p():
|
||||||
|
# npu_top_k_top_p requires parameter k ranged from 1 to 1024
|
||||||
|
if k is None or 1 <= int(k.max()) <= 1024:
|
||||||
|
# npu_top_k_top_p's parameter order is (logits, p, k), not (logits, k, p)
|
||||||
|
return torch_npu.npu_top_k_top_p(logits, p, k)
|
||||||
|
|
||||||
probs = logits.softmax(dim=-1)
|
probs = logits.softmax(dim=-1)
|
||||||
probs_sort, _ = probs.sort(dim=-1, descending=False)
|
probs_sort, _ = probs.sort(dim=-1, descending=False)
|
||||||
|
|||||||
Reference in New Issue
Block a user