xc-llm-ascend

Files

LeeWenquan 3047b724b3 Add GemmaRmsNorm ACLGraph Support (#6473 )

### What this PR does / why we need it?
1. New Custom NPU Operation: Introduced npu_gemma_rms_norm in
csrc/torch_binding.cpp to provide optimized Gemma RMS Normalization
support for Ascend NPUs. This function includes logic to handle dynamic
shapes for the gamma tensor.
2. PyTorch Operator Registration: The new npu_gemma_rms_norm operation
has been registered with the PyTorch custom operator library, making it
accessible from Python.
Meta-Implementation for ACLGraph: A corresponding meta-implementation,
npu_gemma_rms_norm_meta, was added in csrc/torch_binding_meta.cpp. This
is crucial for symbolic tracing and allowing the custom kernel to be
captured and optimized by ACLGraph.
3. Python Frontend Integration: The vllm_ascend/ops/layernorm.py file
was updated to utilize the newly added
torch.ops._C_ascend.npu_gemma_rms_norm for Gemma RMS Normalization,
replacing the generic torch_npu.npu_rms_norm
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: SunnyLee219 <3294305115@qq.com>
Signed-off-by: LeeWenquan <83354342+SunnyLee151064@users.noreply.github.com>

2026-03-05 16:15:07 +08:00

fused_moe

[misc] move mxfp_compat into device to decouple from quantization init chain (#6918 )

2026-03-02 18:17:01 +08:00

triton

[Feat]fused_qkvzba_split_reshape supports token number greater than 65536 (#6740 )

2026-03-05 14:41:38 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8 ) (#6604 )

2026-02-07 09:16:07 +08:00

activation.py

[Attention] add gpt-oss support (#5901 )