xc-llm-ascend/ops at a813eadd2d2fb5d4f6179fbed860aaebfe2b3db6 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Shanshan Shen a813eadd2d [MM][Perf] Enable 2.7x faster for convolution computation with aclnn BatchMatMulV2 (#7017 )

### What this PR does / why we need it?
Currently, we are using
e2b31243c0/vllm/model_executor/layers/conv.py (L219-L232)
for convolution computation, which is used in patch embedding for VL
models.

After profiling, we find that this linear method will take about **6.87
ms**, which is much slower than just using `F.conv3d()`. In
`F.conv3d()`, it will call aclnn `BatchMatMulV2` with optimization on
Ascend NPU, which only take about **2.50 ms** and is **2.7x faster**
than linear method.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2
---------
Signed-off-by: shen-shanshan <467638484@qq.com>

2026-03-06 14:26:37 +08:00

..

[Bugfix] Fix the moe_forward error when setting enable_static_kernel … (#6964 )

2026-03-06 10:36:10 +08:00

[Ops][Misc] Optimize split_qkv_rmsnorm_rope op (#6827 )

2026-03-06 09:30:31 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8 ) (#6604 )

2026-02-07 09:16:07 +08:00

activation.py

[Attention] add gpt-oss support (#5901 )

2026-02-12 10:55:34 +08:00

conv.py

[MM][Perf] Enable 2.7x faster for convolution computation with aclnn BatchMatMulV2 (#7017 )

2026-03-06 14:26:37 +08:00

flashcomm2_oshard_manager.py

[Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8 ) (#6604 )

2026-02-07 09:16:07 +08:00

layer_shard_linear.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

layernorm.py

Add GemmaRmsNorm ACLGraph Support (#6473 )

2026-03-05 16:15:07 +08:00

linear_op.py

[Bugfix] rename enable_flash_comm_v1 back to enable_sp (#6883 )

2026-03-01 20:22:50 +08:00

linear.py

[Bugfix] rename enable_flash_comm_v1 back to enable_sp (#6883 )

2026-03-01 20:22:50 +08:00

mla.py

[Main2Main] Upgrade vLLM to 0226 (#6813 )

2026-02-27 16:05:21 +08:00

mm_encoder_attention.py

[Main2Main] Upgrade vLLM to 0303 (#6944 )

2026-03-06 09:08:52 +08:00

register_custom_ops.py

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00

rotary_embedding.py

[Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (#6939 )

2026-03-04 16:02:08 +08:00

vocab_parallel_embedding.py

[Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8 ) (#6604 )

2026-02-07 09:16:07 +08:00

weight_prefetch.py

[Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (#6629 )

2026-02-10 14:14:37 +08:00