xc-llm-ascend/ops at c860535246cc751b6be7d1da2092e4380013598c - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

yesyue-w c860535246 【A5】【Qwen VL】Qwen VL adapt for A5 (#7046 )

### What this PR does / why we need it?
Replace the '_npu_flash_attention_unpad' operator with the
'npu_fusion_attention' operator to ensure that the Qwen VL model can run
in the A5 environment and remove the 'mrope' operator call restriction
for A5.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: 汪越 <wangyue361@h-partners.com>

2026-03-20 16:56:12 +08:00

..

GMM custom operator optimization in small batch scenarios (#7100 )

2026-03-19 16:10:30 +08:00

[OPS]add split_qkv_tp_rmsnorm_rope ops (#7376 )

2026-03-19 17:19:18 +08:00

__init__.py

[OPS]add split_qkv_tp_rmsnorm_rope ops (#7376 )

2026-03-19 17:19:18 +08:00

activation.py

[Attention] add gpt-oss support (#5901 )

2026-02-12 10:55:34 +08:00

conv.py

[MM][Perf] Enable 2.7x faster for convolution computation with aclnn BatchMatMulV2 (#7017 )

2026-03-06 14:26:37 +08:00

flashcomm2_oshard_manager.py

[Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8 ) (#6604 )

2026-02-07 09:16:07 +08:00

layer_shard_linear.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

layernorm.py

[Perf] Optimize bias handling in AscendRMSNorm (#7226 )

2026-03-17 16:53:28 +08:00

linear_op.py

Refactor duplicated code into a common method to reduce redundancy (#7210 )

2026-03-20 16:49:02 +08:00

linear.py

[Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156 )

2026-03-15 17:55:42 +08:00

mla.py

[Feature]Supports DSv3.1 PD separation and C8 quantization (#7222 )

2026-03-16 22:49:05 +08:00

mm_encoder_attention.py

【A5】【Qwen VL】Qwen VL adapt for A5 (#7046 )

2026-03-20 16:56:12 +08:00

register_custom_ops.py

[Feature] support aclgraph for model runner v2 (#7110 )

2026-03-13 09:11:46 +08:00

rotary_embedding.py

【A5】【Qwen VL】Qwen VL adapt for A5 (#7046 )

2026-03-20 16:56:12 +08:00

vocab_parallel_embedding.py

[Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8 ) (#6604 )

2026-02-07 09:16:07 +08:00

weight_prefetch.py

[Misc] Drop Prefetch MLP Env (#7357 )

2026-03-19 14:27:27 +08:00