From 8bcc0ccd571a001bcf1f428aceb2445ba0375fac Mon Sep 17 00:00:00 2001 From: linfeng-yuan <1102311262@qq.com> Date: Wed, 17 Sep 2025 20:01:47 +0800 Subject: [PATCH] [bugfix] fix shared expert dp with hybrid kvcache (#2964) ### What this PR does / why we need it? https://github.com/vllm-project/vllm-ascend/pull/2849 moves the implementation of `shared_expert_dp` to torchair deepseek_modeling. However, the calling of `set_forward_context` with `enforce_eager` and `shared_expert_dp` falls back to the implementation of model_runner_v1.py and set the global attn_metadata as a dictionary. It leads to a RuntimerError when attn_metadata is got from the forward context and used in torchair_deepseek_v2.py. This PR fixes this problem by introducing the transformation of attn_metadata in this file. Note that current E2E testing lacks the case of deepseek with `shared_expert_dp`. We need to add an ST with `shared_expert_dp` in testing workflow. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? e2e vllm serving with `enable_shared_expert_dp: true` passed. - vLLM version: v0.10.2 - vLLM main: https://github.com/vllm-project/vllm/commit/de3e53a75ba9f31f446926911b7c44561af3b2ee Signed-off-by: linfeng-yuan <1102311262@qq.com> --- vllm_ascend/torchair/models/torchair_deepseek_v2.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/vllm_ascend/torchair/models/torchair_deepseek_v2.py b/vllm_ascend/torchair/models/torchair_deepseek_v2.py index 845793d..697776a 100644 --- a/vllm_ascend/torchair/models/torchair_deepseek_v2.py +++ b/vllm_ascend/torchair/models/torchair_deepseek_v2.py @@ -813,6 +813,8 @@ class TorchairDeepseekV2DecoderLayer(DeepseekV2DecoderLayer): residual = get_tp_group().all_gather(residual, 0) attn_metadata = get_forward_context().attn_metadata + if attn_metadata is not None and isinstance(attn_metadata, dict): + attn_metadata = attn_metadata['model.layers.0.self_attn.attn'] if attn_metadata is not None: num_tokens = attn_metadata.num_actual_tokens else: