[MOE]move weight transpose to wakeup for RL secnarios (#4626)

### What this PR does / why we need it? In reinforcement learning scenarios, the current inference applies a transpose operation to the weights. For a cleaner architecture, the weight transpose module was moved to wakeup. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: ad32e3e19c Signed-off-by: lhp-deep <liuhaopeng1@huawei.com> Co-authored-by: weijinqian0 <1184188277@qq.com>
2025-12-08 20:34:52 +08:00
parent 58db21f56a
commit b230e7e987
7 changed files with 132 additions and 120 deletions
--- a/tests/ut/worker/test_worker_v1.py
+++ b/tests/ut/worker/test_worker_v1.py
@@ -281,9 +281,22 @@ class TestNPUWorker(TestBase):
        mock_allocator = MagicMock()
        mock_allocator_class.get_instance.return_value = mock_allocator

+        mock_hidden_size = MagicMock()
+        mock_hf_config = MagicMock()
+        mock_hf_config.hidden_size = mock_hidden_size
+        mock_model_config = MagicMock()
+        mock_model_config.hf_config = mock_hf_config
+        mock_vllm_config = MagicMock()
+        mock_vllm_config.model_config = mock_model_config
+
+        mock_model_runner = MagicMock()
+        mock_model_runner.model = MagicMock()
+
        # Create worker mock
        with patch.object(NPUWorker, "__init__", lambda x, **kwargs: None):
            worker = NPUWorker()
+            worker.model_runner = mock_model_runner
+            worker.vllm_config = mock_vllm_config
            worker._sleep_saved_buffers = {}
            # Test wake_up method
            worker.wake_up(tags=["test_tag"])