From 97999347c8e89e4d94c5ae2ce9cc803327e3d163 Mon Sep 17 00:00:00 2001
From: Yizhou <136800916+yiz-liu@users.noreply.github.com>
Date: Mon, 24 Nov 2025 14:07:10 +0800
Subject: [PATCH] [Fix] Remove unnecessary NPU synchronization in MTP proposer
 (#4325)

### What this PR does / why we need it?
Remove unnecessary NPU synchronization in MTP proposer to improve
performances.

Removing this synchronization point improves pipeline efficiency by
allowing for better overlap between CPU and NPU operations. A more
proper one is already implemented in #4233

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None.


- vLLM version: v0.11.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/2918c1b49c88c29783c86f78d2c4221cb9622379

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
---
 vllm_ascend/spec_decode/mtp_proposer.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/vllm_ascend/spec_decode/mtp_proposer.py b/vllm_ascend/spec_decode/mtp_proposer.py
index df446537..ea2889ec 100644
--- a/vllm_ascend/spec_decode/mtp_proposer.py
+++ b/vllm_ascend/spec_decode/mtp_proposer.py
@@ -886,7 +886,6 @@ class MtpProposer(Proposer):
                 attn_metadata_i.decode.max_seq_lens = min(
                     attn_metadata_i.decode.max_seq_lens,
                     self.runner.model_config.max_model_len)
-            torch.npu.synchronize()
 
         # mtp>1: [batch_size, k]
         draft_token_ids = torch.stack(draft_token_ids_list, dim=1)