[MTP][Bugfix] Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights (#7139)
### What this PR does / why we need it?
When GLM5 target model uses rotary quant, the final hidden states passes
to MTP need to do an extra rotary.
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: Wangbingjie <wangbj1207@126.com>
Signed-off-by: wangbj127 <256472688+wangbj127@users.noreply.github.com>
This commit is contained in:
@@ -452,3 +452,35 @@
|
||||
# https://github.com/vllm-project/vllm/pull/34880
|
||||
# Future Plan:
|
||||
# Remove this patch when vLLM merges the PR.
|
||||
#
|
||||
# ** 21. File: worker/patch_deepseek_mtp.py**
|
||||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
# 1. `vllm.model_executor.models.deepseek_v2.get_spec_layer_idx_from_weight_name` and
|
||||
# `vllm.model_executor.models.deepseek_mtp.get_spec_layer_idx_from_weight_name`
|
||||
# Why:
|
||||
# When GLM5 uses rotary quant in vllm-ascend, the MTP layer needs to load an extra weight
|
||||
# named `rot.weight`.
|
||||
# How:
|
||||
# If weight name starts with `rot`, return `layer_id + i` like other tensors in MTP layer.
|
||||
# Related PR (if no, explain why):
|
||||
# Rotary quant is a unique feature of vllm-ascend.
|
||||
# Future Plan:
|
||||
# Remove this patch when vllm supports rotary quant or pluggable `MultiTokenPredictorLayer`.
|
||||
# 2. `vllm.model_executor.models.deepseek_mtp.DeepSeekMultiTokenPredictorLayer`
|
||||
# Why:
|
||||
# When GLM5 uses rotary quant in vllm-ascend, the `previous_hidden_states` does not .
|
||||
# How:
|
||||
# If the target model uses rotary quant, a new linear operation is added before `ehnorm`.
|
||||
# Related PR (if no, explain why):
|
||||
# Rotary quant is a unique feature of vllm-ascend.
|
||||
# Future Plan:
|
||||
# Remove this patch when vllm supports rotary quant or pluggable `MultiTokenPredictorLayer`.
|
||||
# 3. `vllm.model_executor.models.deepseek_mtp.DeepSeekMTP._rewrite_spec_layer_name`
|
||||
# Why:
|
||||
# Rename `rot.weight` to match the format of weights in `DeepSeekMTP`.
|
||||
# How:
|
||||
# If the weight name is `rot`, rename it to `model.layers.{spec_layer}.rot.weight`.
|
||||
# Related PR (if no, explain why):
|
||||
# Rotary quant is a unique feature of vllm-ascend.
|
||||
# Future Plan:
|
||||
# Remove this patch when vllm supports rotary quant or pluggable `MultiTokenPredictorLayer`.
|
||||
|
||||
Reference in New Issue
Block a user