[Ops][Refactor] Remove custom rotary_embedding operator (#6523)

### What this PR does / why we need it? This PR removes the custom `rotary_embedding` operator and its associated C++ kernel implementation, PyTorch bindings, and tests. The codebase now falls back to using the native `torch_npu._npu_rotary_embedding` implementation. This change simplifies the codebase by removing custom, platform-specific kernel code and relying on the standard NPU library implementation, which is presumably more optimized and easier to maintain. ### Does this PR introduce _any_ user-facing change? No. This is an internal refactoring and does not introduce any user-facing changes. ### How was this patch tested? The tests for the custom `rotary_embedding` operator have been removed along with the operator itself. The correctness of the fallback to the native `torch_npu` implementation is verified by existing CI tests for attention layers and models that use rotary embeddings. - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-02-07 09:24:05 +08:00
parent 06aa6036f6
commit 6c49f95da2
8 changed files with 59 additions and 1392 deletions
--- a/csrc/ops.h
+++ b/csrc/ops.h
@@ -24,13 +24,6 @@
 #include "torch_npu/csrc/aten/common/from_blob.h"

 namespace vllm_ascend {
-  extern void rotary_embedding_impl(AscendType type, bool isNeox, void *stream, int64_t *positions, void *queryDst,
-    void *keyDst, void *query, void *key, void *cosSinCache, const int rotDim,
-    const int64_t queryStride, const int64_t keyStride, const int64_t dstQueryStride,
-    const int64_t dstKeyStride, const int numHeads, const int numKvHeads,
-    const int headSize, const int64_t numTokens, const uint32_t loopCnt,
-    uint32_t aivNum);
-
  extern void get_masked_input_and_mask_impl(
    void* stream,
    void* input,