[v0.18.0][Bugfix][EAGLE] Fix FIA pad bug under max concurrency (#7754)
cherry picked from https://github.com/vllm-project/vllm-ascend/pull/7740
Fixes padding problems of FIA op under max concurrency.
- vLLM version: v0.18.0
- vLLM main:
35141a7eed
Signed-off-by: Wangbingjie <wangbj1207@126.com>
This commit is contained in:
@@ -168,6 +168,8 @@ class SpecDecodeBaseProposer(EagleProposer):
|
||||
# RoPE need (max_num_tokens,)
|
||||
self.positions = torch.zeros(self.max_num_tokens, dtype=torch.int32, device=device)
|
||||
|
||||
self.token_arange_np = np.arange(self.max_num_tokens + 1)
|
||||
|
||||
def _get_model(self) -> nn.Module:
|
||||
"""
|
||||
Default method to call get_model(). Can be overridden by subclasses which
|
||||
|
||||
Reference in New Issue
Block a user