### What this PR does / why we need it?
Use fused ops torch_npu.npu_top_k_top_p(logits, p, k) when p and k are
not None, otherwise fallback to the original one. The replacement will
take place automatically when `VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE=1` .
This patch are using `npu_top_k_top_p` which required
torch_npu>=2.5.1.post1.dev20250619
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Tested by DeepSeek R1 and UT passed
Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>