[Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (#6914)

### What this PR does / why we need it?
When using the target model after rotational quantization, the
acceptance rate decreases because the fc weight of the draft model has
not undergone rotational quantization(issue: #6445). We fixed this issue
by performing rotation quantization on the fc weight of the draft model
in the same way as the main model when loading draft model.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com>
This commit is contained in:
zhaomingyu13
2026-03-04 11:29:49 +08:00
committed by GitHub
parent d431d7d526
commit 52d9086f64
4 changed files with 94 additions and 0 deletions

View File

@@ -35,3 +35,4 @@ import vllm_ascend.patch.worker.patch_huanyuan_vl # noqa
import vllm_ascend.patch.worker.patch_routed_experts_capturer # noqa
import vllm_ascend.patch.worker.patch_npugraph_ex_triton # noqa
import vllm_ascend.patch.worker.patch_kimi_k25 # noqa
import vllm_ascend.patch.worker.patch_qwen3_quarot # noqa