xc-llm-ascend

Files

Fager10086 c5dfa8d645 [OPS]add split_qkv_rmsnorm_mrope ops (#6730 )

### What this PR does / why we need it?
This PR adds split_qkv_rmsnorm_mrope kernel with interleaved for qwen3.5
and qwen3-vl to improve performance.

### Does this PR introduce _any_ user-facing change?
Does not.

### How to use?
```python
real_q, real_k, real_v, real_gate = torch.ops.vllm.triton_split_qkv_rmsnorm_mrope(
            qkv=qkv,
            q_weight=q_weight,
            k_weight=k_weight,
            cos_sin=cos_sin,
            num_q_heads=num_q_heads,
            num_kv_heads=num_kv_heads,
            head_size=head_size,
            eps=eps,
            mrope_section=mrope_section,
            is_interleaved=is_interleaved,
            rope_dim=rope_dim,
            has_gate=has_gate,
    )
```
### How was this patch tested?
- vLLM version: v0.16.0
- Accuracy test script：
```shell
pytest tests/e2e/nightly/single_node/ops/singlecard_ops/triton/test_split_qkv_rmsnorm_mrope.py
```

---------

Signed-off-by: Fager <865071616@qq.com>
Signed-off-by: Fager10086 <77871921+Fager10086@users.noreply.github.com>
Signed-off-by: fager <865071616@qq.com>

2026-03-06 16:18:37 +08:00

activation

[Triton] Centralize Ascend extension op dispatch in triton_utils (#6937 )

2026-03-03 17:10:30 +08:00

batch_invariant

perf: adaptive block size selection in linear_persistent kernel (#6537 )

2026-02-04 21:36:26 +08:00

fla

[Feat]fused_qkvzba_split_reshape supports token number greater than 65536 (#6740 )

2026-03-05 14:41:38 +08:00

linearnorm

[OPS]add split_qkv_rmsnorm_mrope ops (#6730 )