perf : optimize memory for deepseek mtp (#2713)

### What this PR does / why we need it? delete the temp tensor to optimize memory for deepseek mtp for torchair case - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: boying <897013703@qq.com>
2025-10-23 15:52:17 +08:00
parent 2584f97217
commit 807686dec9
1 changed files with 1 additions and 0 deletions
--- a/vllm_ascend/torchair/models/torchair_deepseek_mtp.py
+++ b/vllm_ascend/torchair/models/torchair_deepseek_mtp.py
@@ -102,6 +102,7 @@ class TorchairDeepSeekMultiTokenPredictorLayer(DeepSeekMultiTokenPredictorLayer
        hidden_states = self.eh_proj(
            torch.cat([inputs_embeds, previous_hidden_states], dim=-1))

+        del inputs_embeds, previous_hidden_states
        replace_allreduce = hidden_states.shape[0] % self.tp_size == 0

        hidden_states, residual = self.mtp_block(