[main][feature] Support quarot for eagle3 without embedding (#7038)

### What this PR does / why we need it?
If some `eagle3` model without embed_tokens works with `quarot` target
model, the acceptence rate will drop.
We solve it in this PR.
The relative vllm pr is https://github.com/vllm-project/vllm/pull/36225.

- vLLM main:
4034c3d32e

Signed-off-by: drslark <slarksblood@qq.com>
This commit is contained in:
drslark
2026-03-09 10:43:06 +08:00
committed by GitHub
parent 737dfcf638
commit 6a7115fa0d
5 changed files with 148 additions and 82 deletions

View File

@@ -103,8 +103,8 @@ from vllm_ascend.eplb.core.eplb_worker import EplbProcess
from vllm_ascend.eplb.eplb_updator import EplbUpdator
from vllm_ascend.eplb.utils import model_register
from vllm_ascend.ops.rotary_embedding import set_cos_and_sin, update_cos_sin
from vllm_ascend.patch.worker.patch_draft_quarot import patch_load_weights
from vllm_ascend.patch.worker.patch_module import patch_torch_npu_argsort
from vllm_ascend.patch.worker.patch_qwen3_quarot import patch_load_weights
from vllm_ascend.sample.sampler import AscendSampler
from vllm_ascend.spec_decode import get_spec_decode_method
from vllm_ascend.spec_decode.eagle_proposer import AscendEagleProposer