xc-llm-ascend/vllm_ascend/ops/triton/triton_utils.py

from typing import Any

import torch
from vllm.triton_utils import HAS_TRITON, triton

_NUM_AICORE = -1
_NUM_VECTORCORE = -1


def init_device_properties_triton():
    global _NUM_AICORE, _NUM_VECTORCORE
    if _NUM_AICORE == -1 and HAS_TRITON:
        device_properties: dict[str, Any] = triton.runtime.driver.active.utils.get_device_properties(
            torch.npu.current_device()
        )
        _NUM_AICORE = device_properties.get("num_aicore", -1)
        _NUM_VECTORCORE = device_properties.get("num_vectorcore", -1)
        assert _NUM_AICORE > 0 and _NUM_VECTORCORE > 0, "Failed to detect device properties."


def get_aicore_num():
    global _NUM_AICORE
    assert _NUM_AICORE > 0, "Device properties not initialized. Please call init_device_properties_triton() first."
    return _NUM_AICORE


def get_vectorcore_num():
    global _NUM_VECTORCORE
    assert _NUM_VECTORCORE > 0, "Device properties not initialized. Please call init_device_properties_triton() first."
    return _NUM_VECTORCORE
[Lint]Style: Convert `vllm-ascend/` to ruff format(Batch #12) (#6177) ### What this PR does / why we need it? Scope of Changes: \| File Path \| \| :--- \| \| `vllm_ascend/ops/triton/activation/swiglu_quant.py` \| \| `vllm_ascend/ops/triton/batch_invariant/matmul.py` \| \| `vllm_ascend/ops/triton/batch_invariant/mean.py` \| \| `vllm_ascend/ops/triton/batch_invariant/rmsnorm.py` \| \| `vllm_ascend/ops/triton/fla/chunk.py` \| \| `vllm_ascend/ops/triton/fla/chunk_delta_h.py` \| \| `vllm_ascend/ops/triton/fla/chunk_o.py` \| \| `vllm_ascend/ops/triton/fla/chunk_scaled_dot_kkt.py` \| \| `vllm_ascend/ops/triton/fla/cumsum.py` \| \| `vllm_ascend/ops/triton/fla/fused_qkvzba_split_reshape.py` \| \| `vllm_ascend/ops/triton/fla/l2norm.py` \| \| `vllm_ascend/ops/triton/fla/layernorm_guard.py` \| \| `vllm_ascend/ops/triton/fla/sigmoid_gating.py` \| \| `vllm_ascend/ops/triton/fla/solve_tril.py` \| \| `vllm_ascend/ops/triton/fla/utils.py` \| \| `vllm_ascend/ops/triton/fla/wy_fast.py` \| \| `vllm_ascend/ops/triton/fused_gdn_gating.py` \| \| `vllm_ascend/ops/triton/layernorm_gated.py` \| \| `vllm_ascend/ops/triton/linearnorm/split_qkv_rmsnorm_rope.py` \| \| `vllm_ascend/ops/triton/mamba/causal_conv1d.py` \| \| `vllm_ascend/ops/triton/reject_sample.py` \| \| `vllm_ascend/ops/triton/rope.py` \| \| `vllm_ascend/ops/triton/spec_decode/utils.py` \| \| `vllm_ascend/ops/triton/triton_utils.py` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: https://github.com/vllm-project/vllm/commit/d68209402ddab3f54a09bc1f4de9a9495a283b60 Signed-off-by: MrZ20 <2609716663@qq.com> 2026-01-23 14:59:19 +08:00			`from typing import Any`
[Ops][Triton] Add a triton kernel supporting partial rope. (#4413) ### What this PR does / why we need it? This PR adds a triton rope kernel witch supports scenarios of `rope_dim != head_dim`. This can save the split op before rope and the concat op after rope. Profiling shows improvement. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? I will add related ut after ci integrated with triton. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-12-02 17:10:19 +08:00
			`import torch`
			`from vllm.triton_utils import HAS_TRITON, triton`

			`_NUM_AICORE = -1`
			`_NUM_VECTORCORE = -1`


			`def init_device_properties_triton():`
			`global _NUM_AICORE, _NUM_VECTORCORE`
			`if _NUM_AICORE == -1 and HAS_TRITON:`
[Lint]Style: Convert `vllm-ascend/` to ruff format(Batch #12) (#6177) ### What this PR does / why we need it? Scope of Changes: \| File Path \| \| :--- \| \| `vllm_ascend/ops/triton/activation/swiglu_quant.py` \| \| `vllm_ascend/ops/triton/batch_invariant/matmul.py` \| \| `vllm_ascend/ops/triton/batch_invariant/mean.py` \| \| `vllm_ascend/ops/triton/batch_invariant/rmsnorm.py` \| \| `vllm_ascend/ops/triton/fla/chunk.py` \| \| `vllm_ascend/ops/triton/fla/chunk_delta_h.py` \| \| `vllm_ascend/ops/triton/fla/chunk_o.py` \| \| `vllm_ascend/ops/triton/fla/chunk_scaled_dot_kkt.py` \| \| `vllm_ascend/ops/triton/fla/cumsum.py` \| \| `vllm_ascend/ops/triton/fla/fused_qkvzba_split_reshape.py` \| \| `vllm_ascend/ops/triton/fla/l2norm.py` \| \| `vllm_ascend/ops/triton/fla/layernorm_guard.py` \| \| `vllm_ascend/ops/triton/fla/sigmoid_gating.py` \| \| `vllm_ascend/ops/triton/fla/solve_tril.py` \| \| `vllm_ascend/ops/triton/fla/utils.py` \| \| `vllm_ascend/ops/triton/fla/wy_fast.py` \| \| `vllm_ascend/ops/triton/fused_gdn_gating.py` \| \| `vllm_ascend/ops/triton/layernorm_gated.py` \| \| `vllm_ascend/ops/triton/linearnorm/split_qkv_rmsnorm_rope.py` \| \| `vllm_ascend/ops/triton/mamba/causal_conv1d.py` \| \| `vllm_ascend/ops/triton/reject_sample.py` \| \| `vllm_ascend/ops/triton/rope.py` \| \| `vllm_ascend/ops/triton/spec_decode/utils.py` \| \| `vllm_ascend/ops/triton/triton_utils.py` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: https://github.com/vllm-project/vllm/commit/d68209402ddab3f54a09bc1f4de9a9495a283b60 Signed-off-by: MrZ20 <2609716663@qq.com> 2026-01-23 14:59:19 +08:00			`device_properties: dict[str, Any] = triton.runtime.driver.active.utils.get_device_properties(`
			`torch.npu.current_device()`
			`)`
[Ops][Triton] Add a triton kernel supporting partial rope. (#4413) ### What this PR does / why we need it? This PR adds a triton rope kernel witch supports scenarios of `rope_dim != head_dim`. This can save the split op before rope and the concat op after rope. Profiling shows improvement. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? I will add related ut after ci integrated with triton. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> 2025-12-02 17:10:19 +08:00			`_NUM_AICORE = device_properties.get("num_aicore", -1)`
			`_NUM_VECTORCORE = device_properties.get("num_vectorcore", -1)`
			`assert _NUM_AICORE > 0 and _NUM_VECTORCORE > 0, "Failed to detect device properties."`


			`def get_aicore_num():`
			`global _NUM_AICORE`
			`assert _NUM_AICORE > 0, "Device properties not initialized. Please call init_device_properties_triton() first."`
			`return _NUM_AICORE`


			`def get_vectorcore_num():`
			`global _NUM_VECTORCORE`
			`assert _NUM_VECTORCORE > 0, "Device properties not initialized. Please call init_device_properties_triton() first."`
			`return _NUM_VECTORCORE`