### What this PR does / why we need it? Replace pyorch implement of sampling with triton kernels ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.11.2 --------- Signed-off-by: Lord_of_Ironhill <suiweiyi@huawei.com> Signed-off-by: whx-sjtu <2952154980@qq.com> Co-authored-by: Lord_of_Ironhill <suiweiyi@huawei.com> Co-authored-by: whx-sjtu <2952154980@qq.com>