[Ops][Triton] Add a triton kernel supporting partial rope. (#4413)

### What this PR does / why we need it?
This PR adds a triton rope kernel witch supports scenarios of `rope_dim
!= head_dim`. This can save the split op before rope and the concat op
after rope. Profiling shows improvement.

### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?
I will add related ut after ci integrated with triton.


- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
This commit is contained in:
whx
2025-12-02 17:10:19 +08:00
committed by GitHub
parent 8907010815
commit 96b2cdf6d8
6 changed files with 421 additions and 20 deletions

View File

@@ -49,6 +49,7 @@ from vllm_ascend.ascend_config import get_ascend_config, init_ascend_config
from vllm_ascend.cpu_binding import bind_cpus
from vllm_ascend.device_allocator.camem import CaMemAllocator
from vllm_ascend.distributed.parallel_state import init_ascend_model_parallel
from vllm_ascend.ops.triton.triton_utils import init_device_properties_triton
from vllm_ascend.platform import NPUPlatform
from vllm_ascend.utils import (check_ascend_device_type, is_enable_nz,
prefill_context_parallel_enable,
@@ -226,6 +227,8 @@ class NPUWorker(WorkerBase):
self._init_worker_distributed_environment()
# Set random seed.
NPUPlatform.seed_everything(self.model_config.seed)
# Initialize device properties used by triton kernels.
init_device_properties_triton()
return device
def init_device(self):