From 8bcc0ccd571a001bcf1f428aceb2445ba0375fac Mon Sep 17 00:00:00 2001
From: linfeng-yuan <1102311262@qq.com>
Date: Wed, 17 Sep 2025 20:01:47 +0800
Subject: [PATCH] [bugfix] fix shared expert dp with hybrid kvcache (#2964)

### What this PR does / why we need it?
https://github.com/vllm-project/vllm-ascend/pull/2849 moves the
implementation of `shared_expert_dp` to torchair deepseek_modeling.
However, the calling of `set_forward_context` with `enforce_eager` and
`shared_expert_dp` falls back to the implementation of
model_runner_v1.py and set the global attn_metadata as a dictionary. It
leads to a RuntimerError when attn_metadata is got from the forward
context and used in torchair_deepseek_v2.py. This PR fixes this problem
by introducing the transformation of attn_metadata in this file.

Note that current E2E testing lacks the case of deepseek with
`shared_expert_dp`. We need to add an ST with `shared_expert_dp` in
testing workflow.

### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
e2e vllm serving with `enable_shared_expert_dp: true` passed.

- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/de3e53a75ba9f31f446926911b7c44561af3b2ee

Signed-off-by: linfeng-yuan <1102311262@qq.com>
---
 vllm_ascend/torchair/models/torchair_deepseek_v2.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/vllm_ascend/torchair/models/torchair_deepseek_v2.py b/vllm_ascend/torchair/models/torchair_deepseek_v2.py
index 845793d..697776a 100644
--- a/vllm_ascend/torchair/models/torchair_deepseek_v2.py
+++ b/vllm_ascend/torchair/models/torchair_deepseek_v2.py
@@ -813,6 +813,8 @@ class TorchairDeepseekV2DecoderLayer(DeepseekV2DecoderLayer):
             residual = get_tp_group().all_gather(residual, 0)
 
             attn_metadata = get_forward_context().attn_metadata
+            if attn_metadata is not None and isinstance(attn_metadata, dict):
+                attn_metadata = attn_metadata['model.layers.0.self_attn.attn']
             if attn_metadata is not None:
                 num_tokens = attn_metadata.num_actual_tokens
             else: