[main][feature] Support quarot for eagle3 without embedding (#7038)
### What this PR does / why we need it?
If some `eagle3` model without embed_tokens works with `quarot` target
model, the acceptence rate will drop.
We solve it in this PR.
The relative vllm pr is https://github.com/vllm-project/vllm/pull/36225.
- vLLM main:
4034c3d32e
Signed-off-by: drslark <slarksblood@qq.com>
This commit is contained in:
@@ -319,7 +319,7 @@
|
||||
# https://github.com/vllm-project/vllm/pull/34336
|
||||
# Future Plan:
|
||||
# Remove this patch when vLLM merges the PR.
|
||||
# ** 16. File: worker/patch_qwen3_quarot.py**
|
||||
# ** 16. File: worker/patch_draft_quarot.py**
|
||||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
# 1. `vllm.model_executor.models.llama_eagle3.Eagle3LlamaForCausalLM.load_weights`
|
||||
# Why:
|
||||
@@ -328,5 +328,7 @@
|
||||
# How:
|
||||
# Dynamically replace the `load_weights` function at runtime,
|
||||
# and fix `target_config` into the new implementation with a closure.
|
||||
# Related PR (if no, explain why):
|
||||
# https://github.com/vllm-project/vllm/pull/36225
|
||||
# Future Plan:
|
||||
# Remove this patch when vLLM merges the PR.
|
||||
|
||||
Reference in New Issue
Block a user