[Performance]Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder (#7737)

### What this PR does / why we need it? Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder and add doc for DeepSeekOCR2.md ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vllm 0.18.0 - vllm-ascend main 1. _create_custom_4d_mask during 141ms49us620ns --> _create_npu_optimized_mask during 1ms227us780ns 2. convd2d : 27ms --> matmul <1ms 3. relposattention：sdpa->prompt_flash_attention --------- Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com>
2026-03-31 14:49:29 +08:00
parent 2a0a588311
commit 4f259d4fd8
4 changed files with 432 additions and 2 deletions
--- a/vllm_ascend/patch/worker/init.py
+++ b/vllm_ascend/patch/worker/init.py
@@ -49,3 +49,4 @@ import vllm_ascend.patch.worker.patch_deepseek_mtp  # noqa
 import vllm_ascend.patch.worker.patch_v2.patch_input_batch  # noqa
 import vllm_ascend.patch.worker.patch_v2.patch_model_state  # noqa
 import vllm_ascend.patch.worker.patch_v2.patch_block_table  # noqa
+import vllm_ascend.patch.worker.patch_deepencoder2  # noqa