[MOE]move weight transpose to wakeup for RL secnarios (#4626)

### What this PR does / why we need it?
In reinforcement learning scenarios, the current inference applies a
transpose operation to the weights. For a cleaner architecture, the
weight transpose module was moved to wakeup.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: lhp-deep <liuhaopeng1@huawei.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
This commit is contained in:
lhp-deep
2025-12-08 20:34:52 +08:00
committed by GitHub
parent 58db21f56a
commit b230e7e987
7 changed files with 132 additions and 120 deletions

View File

@@ -281,9 +281,22 @@ class TestNPUWorker(TestBase):
mock_allocator = MagicMock()
mock_allocator_class.get_instance.return_value = mock_allocator
mock_hidden_size = MagicMock()
mock_hf_config = MagicMock()
mock_hf_config.hidden_size = mock_hidden_size
mock_model_config = MagicMock()
mock_model_config.hf_config = mock_hf_config
mock_vllm_config = MagicMock()
mock_vllm_config.model_config = mock_model_config
mock_model_runner = MagicMock()
mock_model_runner.model = MagicMock()
# Create worker mock
with patch.object(NPUWorker, "__init__", lambda x, **kwargs: None):
worker = NPUWorker()
worker.model_runner = mock_model_runner
worker.vllm_config = mock_vllm_config
worker._sleep_saved_buffers = {}
# Test wake_up method
worker.wake_up(tags=["test_tag"])