[v0.18.0][Bugfix][EAGLE] Fix FIA pad bug under max concurrency (#7754)

cherry picked from https://github.com/vllm-project/vllm-ascend/pull/7740
Fixes padding problems of FIA op under max concurrency.

- vLLM version: v0.18.0
- vLLM main:
35141a7eed

Signed-off-by: Wangbingjie <wangbj1207@126.com>
This commit is contained in:
wangbj127
2026-03-29 12:23:44 +08:00
committed by GitHub
parent 5df2ddd8db
commit 9cc41c9457
2 changed files with 31 additions and 0 deletions

View File

@@ -168,6 +168,8 @@ class SpecDecodeBaseProposer(EagleProposer):
# RoPE need (max_num_tokens,)
self.positions = torch.zeros(self.max_num_tokens, dtype=torch.int32, device=device)
self.token_arange_np = np.arange(self.max_num_tokens + 1)
def _get_model(self) -> nn.Module:
"""
Default method to call get_model(). Can be overridden by subclasses which