[1/N][Refactor] Refactor code to adapt with vllm main (#3612)
### What this PR does / why we need it? This is the step 1 of refactoring code to adapt with vllm main, and this pr aligned with17c540a9931. refactor deepseek to the latest code arch as of17c540a9932. bunches of fixes due to vllm changes - Fix `AscendScheduler` `__post_init__`, caused by https://github.com/vllm-project/vllm/pull/25075 - Fix `AscendScheduler` init got an unexpected arg `block_size`, caused by https://github.com/vllm-project/vllm/pull/26296 - Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by https://github.com/vllm-project/vllm/pull/23485 - Fix `MLAAttention` import,caused by https://github.com/vllm-project/vllm/pull/25103 - Fix `SharedFusedMoE` import, caused by https://github.com/vllm-project/vllm/pull/26145 - Fix `LazyLoader` improt, caused by https://github.com/vllm-project/vllm/pull/27022 - Fix `vllm.utils.swap_dict_values` improt, caused by https://github.com/vllm-project/vllm/pull/26990 - Fix `Backend` enum import, caused by https://github.com/vllm-project/vllm/pull/25893 - Fix `CompilationLevel` renaming to `CompilationMode` issue introduced by https://github.com/vllm-project/vllm/pull/26355 - Fix fused_moe ops, caused by https://github.com/vllm-project/vllm/pull/24097 - Fix bert model because of `inputs_embeds`, caused by https://github.com/vllm-project/vllm/pull/25922 - Fix MRope because of `get_input_positions_tensor` to `get_mrope_input_positions`, caused by https://github.com/vllm-project/vllm/pull/24172 - Fix `splitting_ops` changes introduced by https://github.com/vllm-project/vllm/pull/25845 - Fix multi-modality changes introduced by https://github.com/vllm-project/vllm/issues/16229 - Fix lora bias dropping issue introduced by https://github.com/vllm-project/vllm/pull/25807 - Fix structured ouput break introduced by https://github.com/vllm-project/vllm/issues/26737 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? CI passed with existing test. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Icey <1790571317@qq.com>
This commit is contained in:
@@ -37,8 +37,11 @@ class TestTorchairDeepSeekMultiTokenPredictorLayer(PytestBase):
|
||||
mocker.patch(
|
||||
"vllm_ascend.ops.vocab_parallel_embedding.AscendVocabParallelEmbedding.__init__",
|
||||
return_value=None)
|
||||
ascend_config = mocker.MagicMock()
|
||||
ascend_config.max_num_batched_tokens = 2048
|
||||
ascend_config.max_model_len = 1024
|
||||
mocker.patch("vllm_ascend.utils.get_ascend_config",
|
||||
return_value=mocker.Mock())
|
||||
return_value=ascend_config)
|
||||
|
||||
mtp_layer = TorchairDeepSeekMultiTokenPredictorLayer(config, "", None)
|
||||
mocker_deepseek_v2_decode_layer.assert_called_once()
|
||||
@@ -96,8 +99,11 @@ class TestTorchairDeepSeekMultiTokenPredictor(PytestBase):
|
||||
mocker.patch(
|
||||
"vllm_ascend.ops.vocab_parallel_embedding.AscendVocabParallelEmbedding.__init__",
|
||||
return_value=None)
|
||||
ascend_config = mocker.MagicMock()
|
||||
ascend_config.max_num_batched_tokens = 2048
|
||||
ascend_config.max_model_len = 1024
|
||||
mocker.patch("vllm_ascend.utils.get_ascend_config",
|
||||
return_value=mocker.Mock())
|
||||
return_value=ascend_config)
|
||||
|
||||
predictor = TorchairDeepSeekMultiTokenPredictor(
|
||||
vllm_config=mock_vllm_config)
|
||||
@@ -172,8 +178,11 @@ class TestTorchairDeepSeekMTP(PytestBase):
|
||||
mocker.patch(
|
||||
"vllm_ascend.ops.vocab_parallel_embedding.AscendVocabParallelEmbedding.__init__",
|
||||
return_value=None)
|
||||
ascend_config = mocker.MagicMock()
|
||||
ascend_config.max_num_batched_tokens = 2048
|
||||
ascend_config.max_model_len = 1024
|
||||
mocker.patch("vllm_ascend.utils.get_ascend_config",
|
||||
return_value=mocker.Mock())
|
||||
return_value=ascend_config)
|
||||
|
||||
mtp = TorchairDeepSeekMTP(vllm_config=vllm_config)
|
||||
return mtp
|
||||
|
||||
@@ -235,7 +235,8 @@ def test_torchair_deepseek_v2_mlp(mock_distributed, base_config):
|
||||
hidden_act="silu",
|
||||
quant_config=None)
|
||||
assert isinstance(mlp.act_fn, TorchairDeepseekV2SiluAndMul)
|
||||
|
||||
ascend_config = MagicMock()
|
||||
ascend_config._ASCEND_CONFIG.ascend_scheduler_config.enabled = False
|
||||
with patch(
|
||||
"vllm_ascend.torchair.models.torchair_deepseek_v2.QuantizationConfig"
|
||||
) as mock_quant_config:
|
||||
|
||||
Reference in New Issue
Block a user