[Bugfix] Use hf_text_config instead of hf_config to support multimodal PD-Disaggregated (#5205)
### What this PR does / why we need it?
In code files such as`mooncake_connector.py`,
`vllm_config.model_config.hf_config` is used to get the LLM configs.
This approach works for LLMs, but not for multi-modal models. For
multi-modal models, `vllm_config.model_config.hf_text_config` must be
used instead to get the LLM configs.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing UT
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: ApsarasX <apsarax@outlook.com>
This commit is contained in:
@@ -1195,7 +1195,7 @@ class TestMooncakeConnectorWorker(unittest.TestCase):
|
||||
"prefill": {"tp_size": prefill_tp_size, "dp_size": 1, "pp_size": prefill_pp_size},
|
||||
"decode": {"tp_size": decode_tp_size, "dp_size": 1, "pp_size": 1}
|
||||
}.get(k, d)):
|
||||
self.vllm_config.model_config.hf_config.num_key_value_heads = num_kv_heads
|
||||
self.vllm_config.model_config.hf_text_config.num_key_value_heads = num_kv_heads
|
||||
self.vllm_config.model_config.is_deepseek_mla = is_deepseek_mla
|
||||
worker = MooncakeConnectorWorker(self.vllm_config,
|
||||
self.engine_id)
|
||||
|
||||
Reference in New Issue
Block a user