xc-llm-ascend/singlecard at f6db47f1038cf14d6c6f7eb4780be3c839c7656b - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

frank 18b52afe2b [Ops][Misc] Optimize split_qkv_rmsnorm_rope op (#6827 )

### What this PR does / why we need it?

This PR optimizes the `split_qkv_rmsnorm_rope` operator by introducing a
new Triton kernel, `split_qkv_rmsnorm_rope_prefill_kernel`, for the
prefill stage (i.e., large batch sizes). The implementation now
dynamically selects between the existing decode kernel and the new
prefill kernel based on the batch size, which improves performance for
large batch scenarios.

Additionally, the RoPE implementation is updated to support partial
rotation dimensions (`rope_dim`), making the operator more flexible.

### Does this PR introduce _any_ user-facing change?

No. This is a performance optimization and is not expected to introduce
any user-facing changes.

### How was this patch tested?

CI should pass with existing tests. The new prefill path is triggered
when the batch size is larger than the number of available vector cores.
The partial RoPE feature can be tested by passing the `rope_dim`
argument.
- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: guzhiyong <guzhiyong5@h-partners.com>
Signed-off-by: frank <2547457096@qq.com>
Co-authored-by: guzhiyong <guzhiyong5@h-partners.com>

2026-03-06 09:30:31 +08:00

..

[Ops][Misc] Optimize split_qkv_rmsnorm_rope op (#6827 )

2026-03-06 09:30:31 +08:00

model_runner_v2

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

[Main2Main] Upgrade vllm commit to releases/v0.14.0 (#5911 )

2026-01-15 23:22:43 +08:00

[CI] Fix EAGLE CI problems (#6702 )

2026-02-26 10:26:01 +08:00

__init__.py

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

test_aclgraph_accuracy.py

[Ops][Misc] Optimize split_qkv_rmsnorm_rope op (#6827 )

2026-03-06 09:30:31 +08:00

test_aclgraph_batch_invariant.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_aclgraph_mem.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_async_scheduling.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_auto_fit_max_mode_len.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_batch_invariant.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_camem.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_completion_with_prompt_embeds.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_cpu_offloading.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_guided_decoding.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_ilama_lora.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_llama32_lora.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_models.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_multistream_overlap_shared_expert.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_quantization.py

[Feature][Quant] Auto-detect quantization format from model files (#6645 )

2026-02-26 10:59:25 +08:00

test_qwen3_multi_loras.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_sampler.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_vlm.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

test_xlite.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00

utils.py

[Lint]Style: Convert test/ to ruff format(Batch #5 ) (#6747 )

2026-02-24 15:50:00 +08:00