[main][feature] Support quarot for eagle3 without embedding (#7038)
### What this PR does / why we need it?
If some `eagle3` model without embed_tokens works with `quarot` target
model, the acceptence rate will drop.
We solve it in this PR.
The relative vllm pr is https://github.com/vllm-project/vllm/pull/36225.
- vLLM main:
4034c3d32e
Signed-off-by: drslark <slarksblood@qq.com>
This commit is contained in:
@@ -103,8 +103,8 @@ from vllm_ascend.eplb.core.eplb_worker import EplbProcess
|
||||
from vllm_ascend.eplb.eplb_updator import EplbUpdator
|
||||
from vllm_ascend.eplb.utils import model_register
|
||||
from vllm_ascend.ops.rotary_embedding import set_cos_and_sin, update_cos_sin
|
||||
from vllm_ascend.patch.worker.patch_draft_quarot import patch_load_weights
|
||||
from vllm_ascend.patch.worker.patch_module import patch_torch_npu_argsort
|
||||
from vllm_ascend.patch.worker.patch_qwen3_quarot import patch_load_weights
|
||||
from vllm_ascend.sample.sampler import AscendSampler
|
||||
from vllm_ascend.spec_decode import get_spec_decode_method
|
||||
from vllm_ascend.spec_decode.eagle_proposer import AscendEagleProposer
|
||||
|
||||
Reference in New Issue
Block a user