xc-llm-ascend/ops at 7314bbe2df368ce5094ef920297aeec981f42647 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Wangbei25 4f259d4fd8 [Performance]Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder (#7737 )

### What this PR does / why we need it?
Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc
for DeepSeekOCR2.md

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
- vllm 0.18.0
- vllm-ascend main

1. _create_custom_4d_mask during 141ms49us620ns -->
_create_npu_optimized_mask during 1ms227us780ns
2. convd2d : 27ms --> matmul <1ms
3. relposattention：sdpa->prompt_flash_attention

---------

Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Co-authored-by: Wangbei25 <wangbei41@huawie.com>

2026-03-31 14:49:29 +08:00

..

[feat] support dispatch_v2/combine_v2 hierarchy communication (#7698 )

2026-03-27 09:20:16 +08:00

[v0.18.0][Bugfix] Fix the bug of MTP1 crashing in multiple concurrent scenarios. (#7699 )

2026-03-27 14:13:12 +08:00

__init__.py

[OPS]add split_qkv_tp_rmsnorm_rope ops (#7376 )

2026-03-19 17:19:18 +08:00

activation.py

[Attention] add gpt-oss support (#5901 )

2026-02-12 10:55:34 +08:00

conv.py

[MM][Perf] Enable 2.7x faster for convolution computation with aclnn BatchMatMulV2 (#7017 )

2026-03-06 14:26:37 +08:00

flashcomm2_oshard_manager.py

[Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8 ) (#6604 )

2026-02-07 09:16:07 +08:00

layer_shard_linear.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

layernorm.py

Qwen3.5 MoE supports flashcomm v1 (#7644 )

2026-03-25 23:09:33 +08:00

linear_op.py

Qwen3.5 MoE supports flashcomm v1 (#7644 )

2026-03-25 23:09:33 +08:00

linear.py

[Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156 )

2026-03-15 17:55:42 +08:00

mla.py

Main2main upgrade vllm commit to 03 19 17:00 (#7478 )

2026-03-23 16:25:57 +08:00

mm_encoder_attention.py

[MM][Perf] Pre-compute seq_lens and put it on CPU before ViT vision blocks for better performance (#7104 )

2026-03-23 15:24:26 +08:00

register_custom_ops.py

Qwen3.5 MoE supports flashcomm v1 (#7644 )

2026-03-25 23:09:33 +08:00

rel_pos_attention.py

[Performance]Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder (#7737 )

2026-03-31 14:49:29 +08:00

rotary_embedding.py

[UT] Align input arguments with Ascend(Yarn)RotaryEmbedding with vLLM and add ut (#7358 )

2026-03-24 16:02:56 +08:00

vocab_parallel_embedding.py

[Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8 ) (#6604 )

2026-02-07 09:16:07 +08:00

weight_prefetch.py

[Misc] Drop Prefetch MLP Env (#7357 )

2026-03-19 14:27:27 +08:00