【Feature】refactor npu_modelrunner for profile_run (#4993)

### What this PR does / why we need it?
(1)refactor npu_model_runner for profile_run
(2) move _select_moe_comm_method to ascend_forward_context
(3) delete _init_model_kwargs in npu_model_runner

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Na
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Signed-off-by: zhenwenqi2024 <155598497+zhenwenqi2024@users.noreply.github.com>
This commit is contained in:
zhenwenqi2024
2025-12-16 17:44:04 +08:00
committed by GitHub
parent af64087732
commit 4ed2951400
6 changed files with 127 additions and 205 deletions

View File

@@ -165,7 +165,6 @@ class TestEagleProposerDummyRun(TestBase):
self.vllm_config.speculative_config = MagicMock()
self.device = torch.device("cpu")
self.runner = MagicMock()
self.runner._select_moe_comm_method.return_value = "alltoall"
self.vllm_config.cache_config.block_size = 16
self.vllm_config.scheduler_config.max_num_batched_tokens = 1024
@@ -192,8 +191,6 @@ class TestEagleProposerDummyRun(TestBase):
def test_dummy_run_with_prefill(self, mock_context):
mock_context.return_value.__enter__.return_value = None
self.proposer.dummy_run(num_tokens=64, with_prefill=True, num_reqs=4)
self.runner._select_moe_comm_method.assert_called_with(64)
self.proposer.model.assert_called_once()