[Bugfix] Correctly handle the output shape in multimodal attention (#5443)

### What this PR does / why we need it? Fix https://github.com/vllm-project/vllm-ascend/issues/5297, for `AscendMMEncoderAttention` forward, we should keep the output shape consistence with the input - vLLM version: release/v0.13.0 - vLLM main: 81786c8774 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-27 18:42:46 +08:00
parent 1d81bfaed1
commit 58adf7c8ac
2 changed files with 14 additions and 3 deletions
--- a/tests/e2e/conftest.py
+++ b/tests/e2e/conftest.py
@@ -781,6 +781,11 @@ PROMPT_CONFIGS = {
            "fps": 1,
        },
    },
+    "hunyuan-vl": {
+        "model": "Tencent-Hunyuan/HunyuanOCR",
+        "prompt_fn": hunyuan_prompt,
+        "mm_processor_kwargs": {},
+    },
 }