[Kernel] add l2norm triton kernel (#4595)
### What this PR does / why we need it?
This pull request introduces an L2 normalization kernel implemented in
Triton, specifically optimized for Ascend NPUs.
### Does this PR introduce _any_ user-facing change?
No, this PR does not introduce any user-facing changes.
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
bc0a5a0c08
---------
Signed-off-by: Ascendyh <hw7osiris@outlook.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
@@ -13,13 +13,13 @@ from typing import Optional
|
||||
|
||||
import torch
|
||||
from einops import rearrange
|
||||
from vllm.model_executor.layers.fla.ops.l2norm import l2norm_fwd
|
||||
from vllm.model_executor.layers.fla.ops.utils import SUPPRESS_LEVEL
|
||||
|
||||
from .chunk_delta_h import chunk_gated_delta_rule_fwd_h
|
||||
from .chunk_o import chunk_fwd_o
|
||||
from .chunk_scaled_dot_kkt import chunk_scaled_dot_kkt_fwd
|
||||
from .cumsum import chunk_local_cumsum
|
||||
from .l2norm import l2norm_fwd
|
||||
from .solve_tril import solve_tril
|
||||
from .utils import input_guard
|
||||
from .wy_fast import recompute_w_u_fwd
|
||||
|
||||
Reference in New Issue
Block a user