[Bugfix] Fix out-of-bounds access to token_id due to uninitialized logprobs (#4248)
### What this PR does / why we need it?
The logprobs_tensor was not initialized before accessing its token_id
member, leading to a crash when tokenizer.decode() is called by passing
a negative token_id
### How was this patch tested?
Constructed an inference request with two prompts and set
SamplingParams(prompt_logprobs=<non-None value>) (e.g.,
prompt_logprobs=1).
After applying the fix (proper initialization of logprobs_tensor), the
same request completed successfully without errors, and the returned
logprobs matched expected values.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: jiangweixiang <jwx02384838@antgroup.com>
Co-authored-by: jiangweixiang <jwx02384838@antgroup.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
@@ -4276,8 +4276,9 @@ class NPUModelRunner(LoRAModelRunnerMixin, ECConnectorModelRunnerMixin):
|
||||
else:
|
||||
# This is the last chunk of prompt tokens to return.
|
||||
num_logits = num_remaining_tokens
|
||||
completed_prefill_reqs.append(req_id)
|
||||
prompt_logprobs_dict[req_id] = logprobs_tensors
|
||||
if num_logits > 0:
|
||||
completed_prefill_reqs.append(req_id)
|
||||
prompt_logprobs_dict[req_id] = logprobs_tensors
|
||||
|
||||
if num_logits <= 0:
|
||||
# This can happen for the final chunk if we prefilled exactly
|
||||
|
||||
Reference in New Issue
Block a user