[Kernel] add l2norm triton kernel (#4595)

### What this PR does / why we need it?
This pull request introduces an L2 normalization kernel implemented in
Triton, specifically optimized for Ascend NPUs.
### Does this PR introduce _any_ user-facing change?
No, this PR does not introduce any user-facing changes.
### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
bc0a5a0c08

---------

Signed-off-by: Ascendyh <hw7osiris@outlook.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
Ascendyh
2025-12-25 06:06:18 +08:00
committed by GitHub
parent e54630e01c
commit a90482803d
4 changed files with 106 additions and 1 deletions

View File

@@ -13,13 +13,13 @@ from typing import Optional
import torch
from einops import rearrange
from vllm.model_executor.layers.fla.ops.l2norm import l2norm_fwd
from vllm.model_executor.layers.fla.ops.utils import SUPPRESS_LEVEL
from .chunk_delta_h import chunk_gated_delta_rule_fwd_h
from .chunk_o import chunk_fwd_o
from .chunk_scaled_dot_kkt import chunk_scaled_dot_kkt_fwd
from .cumsum import chunk_local_cumsum
from .l2norm import l2norm_fwd
from .solve_tril import solve_tril
from .utils import input_guard
from .wy_fast import recompute_w_u_fwd