xc-llm-ascend

Files

wangxiyuan 6c49f95da2 [Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

### What this PR does / why we need it?
This PR removes the custom `rotary_embedding` operator and its
associated C++ kernel implementation, PyTorch bindings, and tests.

The codebase now falls back to using the native
`torch_npu._npu_rotary_embedding` implementation. This change simplifies
the codebase by removing custom, platform-specific kernel code and
relying on the standard NPU library implementation, which is presumably
more optimized and easier to maintain.

### Does this PR introduce _any_ user-facing change?
No. This is an internal refactoring and does not introduce any
user-facing changes.

### How was this patch tested?
The tests for the custom `rotary_embedding` operator have been removed
along with the operator itself. The correctness of the fallback to the
native `torch_npu` implementation is verified by existing CI tests for
attention layers and models that use rotary embeddings.

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

2026-02-07 09:24:05 +08:00

bgmv_expand.cpp

[BugFix]Fix precision issue for LoRA feature (#4141 )

2025-12-19 14:22:06 +08:00

bgmv_shrink.cpp

[BugFix]Fix precision issue for LoRA feature (#4141 )

2025-12-19 14:22:06 +08:00

get_masked_input_and_mask_kernel.cpp

[Platform] Add initial experimental support for Altlas 300I series (#1333 )

2025-06-21 09:00:16 +08:00

math_utils.h

[OPS] add bmm_transpose ops (#3990 )