[v0.18.0][Bugfix][EAGLE] Fix FIA pad bug under max concurrency (#7754)

cherry picked from https://github.com/vllm-project/vllm-ascend/pull/7740 Fixes padding problems of FIA op under max concurrency. - vLLM version: v0.18.0 - vLLM main: 35141a7eed Signed-off-by: Wangbingjie <wangbj1207@126.com>
2026-03-29 12:23:44 +08:00
parent 5df2ddd8db
commit 9cc41c9457
2 changed files with 31 additions and 0 deletions
--- a/vllm_ascend/spec_decode/eagle_proposer.py
+++ b/vllm_ascend/spec_decode/eagle_proposer.py
@@ -168,6 +168,8 @@ class SpecDecodeBaseProposer(EagleProposer):
            # RoPE need (max_num_tokens,)
            self.positions = torch.zeros(self.max_num_tokens, dtype=torch.int32, device=device)

+        self.token_arange_np = np.arange(self.max_num_tokens + 1)
+
    def _get_model(self) -> nn.Module:
        """
        Default method to call get_model(). Can be overridden by subclasses which