### What this PR does / why we need it?
This 'test_rejection_sampler' unit test is something wrong.
> def test_sample_recovered_tokens_pytorch_autoregressive(self):
> output_token_ids = torch.empty(2, dtype=torch.int32)
> cu_num_draft_tokens = torch.tensor([1, 1])
> draft_token_ids = torch.tensor([0, 1])
len(draft_token_ids ) = 2, cu_num_draft_tokens should be
torch.tensor([1, 2]) or torch.tensor([2, 2])
I fix it and set cu_num_draft_tokens = torch.tensor([1, 2]). The methods
before and after optimization can pass.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
NA
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
---------
Signed-off-by: lio <1983142975@qq.com>
This PR port optimization in PR #2002 to main and makes it cleaner.
- vLLM version: v0.10.0
- vLLM main:
afa5b7ca0b
---------
Signed-off-by: whx-sjtu <2952154980@qq.com>
### What this PR does / why we need it?
add rejection sampler ut.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
UT passed
- vLLM version: v0.10.0
- vLLM main:
586f286789
Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Refactor Sampler implementation from patch way to inherit from vLLM
Sampler interface.
Next step: Make the op `TopKTopPSampler` in vLLM support custom ops
register mechanism
- vLLM version: v0.10.0
- vLLM main:
61a6905ab0
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>