xc-llm-ascend/worker at bff4fbfca5da076b4afa2479d49e691e530baa82 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

ichaoren 9d1452c74d [OPS]add split_qkv_tp_rmsnorm_rope ops (#7376 )

### What this PR does / why we need it?
This PR introduces a new fused Triton kernel,
`split_qkv_tp_rmsnorm_rope` for Minimax-m2.5.

The implementation includes two Triton kernels:
1. `_split_qkv_and_compute_local_qk_var_kernel`: Splits the QKV input
and computes the local variance for RMSNorm.
2. `_apply_global_rmsnorm_kernel`: Applies global RMSNorm (considering
TP all-reduce for variance) and Neox-style RoPE.

### Does this PR introduce _any_ user-facing change?
Does not.

### How was this patch tested?
```python
pytest tests/e2e/nightly/single_node/ops/singlecard_ops/triton/test_split_qkv_tp_rmsnorm_rope.py
```
### Test Data
A3 TP16
基线  

| data       | TTFT(ms) | TPOT(ms) | TPS    |
|------------|---------:|---------:|-------:|
| 4k/1k@bs1  | 267.55   | 25.5     | 38.85  |
| 4k/1k@bs4  | 542.4    | 26.51    | 148.06 |

测试线

| data       | TTFT(ms) | TPOT(ms) | TPS    |
|------------|---------:|---------:|-------:|
| 4k/1k@bs1  | 234.64   | 20.96    | 47.24  |
| 4k/1k@bs4  | 508.36   | 22.16    | 176.69 |


- vLLM version: v0.17.0
- vLLM main:
4034c3d32e

Signed-off-by: xutianyi <xutianyi5@huawei.com>
Co-authored-by: xutianyi <xutianyi5@huawei.com>

2026-03-19 17:19:18 +08:00

..

__init__.py

[Feature]Supports DSv3.1 PD separation and C8 quantization (#7222 )

2026-03-16 22:49:05 +08:00

patch_bert.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_cudagraph.py

[main][bugfix] Fixed the problem of speculative decoding in FULL mode (#7148 )

2026-03-12 14:51:12 +08:00

patch_deepseek_mtp.py

[MTP][Bugfix] Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights (#7139 )

2026-03-12 20:01:24 +08:00

patch_distributed.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_draft_quarot.py

[main][feature] Support quarot for eagle3 without embedding (#7038 )

2026-03-09 10:43:06 +08:00

patch_huanyuan_vl.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_kimi_k25.py

[Misc]fix logger which does not take effects in patches (#7402 )

2026-03-18 17:13:12 +08:00

patch_mamba_utils.py

[Hybrid] support prefix cache for Qwen3.5/Next with --mamba-cache-mode align (#7103 )

2026-03-15 09:44:09 +08:00

patch_minimax_m2_linear_attn.py

[Model] Support Minimax-m2.5 on NPU (#7105 )

2026-03-11 00:12:02 +08:00

patch_minimax_m2.py

[OPS]add split_qkv_tp_rmsnorm_rope ops (#7376 )

2026-03-19 17:19:18 +08:00

patch_module.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_multimodal_merge.py

[bugfix]Fix parameter ordering bug in _merge_multimodal_embeddings (#7068 )

2026-03-09 16:05:52 +08:00

patch_npugraph_ex_triton.py

[npugraph_ex]enable npugraph_ex by default (#6664 )

2026-02-12 08:44:06 +08:00

patch_qwen3_5.py

Add patch_qwen3_5 for triton ops fused_recurrent_gated_delta_rule (#7109 )

2026-03-10 23:28:58 +08:00

patch_qwen3_next_mtp.py

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

patch_qwen3_next.py

[P/D]Mooncake Layerwise Connector supports hybrid attention manager with multiple kvcache groups (#7022 )

2026-03-10 23:59:20 +08:00

patch_rejection_sampler.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_routed_experts_capturer.py

[Feat] Support routing replay (#6696 )

2026-02-26 10:22:47 +08:00

patch_triton.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_unquantized_gemm.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

patch_v2_eagle.py

[Version] Drop 0.16.0 support (#7153 )

2026-03-13 16:14:15 +08:00

patch_v2_uva.py

[Feature] adapt to uva buffer and main2main (#6657 )

2026-02-12 10:36:31 +08:00

patch_weight_utils.py

[Misc]fix logger which does not take effects in patches (#7402 )

2026-03-18 17:13:12 +08:00