[main][bugfix] bugfix for minicpm models (#3527)
### What this PR does / why we need it? bugfix for minicpm-2b and minicpm3-4b - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: Wang Kunpeng <1289706727@qq.com>
This commit is contained in:
@@ -1346,6 +1346,8 @@ class NPUModelRunner(LoRAModelRunnerMixin):
|
||||
positions_cpu = self.positions_cpu[:num_input_tokens]
|
||||
positions = self.positions[:num_input_tokens]
|
||||
seq_lens_cpu = self.seq_lens_cpu[:num_reqs]
|
||||
attn_state = self._build_attn_state(num_reqs, num_scheduled_tokens,
|
||||
num_valid_tokens)
|
||||
self.attn_mask = self._make_attention_mask(seq_lens=seq_lens_cpu,
|
||||
position=positions_cpu,
|
||||
attn_state=attn_state)
|
||||
|
||||
Reference in New Issue
Block a user