[Feature] Add docs of batch invariance and make some extra operators patch (#6910)
### What this PR does / why we need it?
This PR add docs of batch invariance and make some extra operators
according to validation result.
please see https://github.com/vllm-project/vllm-ascend/issues/5487 to
track progress.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2
---------
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
This commit is contained in:
@@ -1,4 +1,5 @@
|
||||
import torch
|
||||
from vllm.model_executor.layers.batch_invariant import vllm_is_batch_invariant
|
||||
from vllm.v1.sample.ops.topk_topp_sampler import TopKTopPSampler
|
||||
from vllm.v1.sample.sampler import Sampler
|
||||
|
||||
@@ -73,6 +74,10 @@ class AscendTopKTopPSampler(TopKTopPSampler):
|
||||
|
||||
def forward_native(self, logits, generators, k, p):
|
||||
"""Override pytorch native implementation to torch_npu"""
|
||||
# when batch_invariant mode is enabled, we should use vllm's implementation.
|
||||
# or it will make batch_invariant mode not working.
|
||||
if vllm_is_batch_invariant():
|
||||
return super().forward_native(logits, generators, k, p)
|
||||
logits = self.apply_top_k_top_p(logits, k, p)
|
||||
logits_to_return = None
|
||||
if self.logprobs_mode == "processed_logits":
|
||||
|
||||
Reference in New Issue
Block a user