[bugfix] fix shared expert dp with hybrid kvcache (#2964)

### What this PR does / why we need it? https://github.com/vllm-project/vllm-ascend/pull/2849 moves the implementation of `shared_expert_dp` to torchair deepseek_modeling. However, the calling of `set_forward_context` with `enforce_eager` and `shared_expert_dp` falls back to the implementation of model_runner_v1.py and set the global attn_metadata as a dictionary. It leads to a RuntimerError when attn_metadata is got from the forward context and used in torchair_deepseek_v2.py. This PR fixes this problem by introducing the transformation of attn_metadata in this file. Note that current E2E testing lacks the case of deepseek with `shared_expert_dp`. We need to add an ST with `shared_expert_dp` in testing workflow. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? e2e vllm serving with `enable_shared_expert_dp: true` passed. - vLLM version: v0.10.2 - vLLM main: de3e53a75b Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-09-17 20:01:47 +08:00
parent 1f6465c399
commit 8bcc0ccd57
1 changed files with 2 additions and 0 deletions
--- a/vllm_ascend/torchair/models/torchair_deepseek_v2.py
+++ b/vllm_ascend/torchair/models/torchair_deepseek_v2.py
@@ -813,6 +813,8 @@ class TorchairDeepseekV2DecoderLayer(DeepseekV2DecoderLayer):
            residual = get_tp_group().all_gather(residual, 0)

            attn_metadata = get_forward_context().attn_metadata
+            if attn_metadata is not None and isinstance(attn_metadata, dict):
+                attn_metadata = attn_metadata['model.layers.0.self_attn.attn']
            if attn_metadata is not None:
                num_tokens = attn_metadata.num_actual_tokens
            else: