[MOE]move weight transpose to wakeup for RL secnarios (#4626)
### What this PR does / why we need it?
In reinforcement learning scenarios, the current inference applies a
transpose operation to the weights. For a cleaner architecture, the
weight transpose module was moved to wakeup.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: lhp-deep <liuhaopeng1@huawei.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
This commit is contained in:
@@ -281,9 +281,22 @@ class TestNPUWorker(TestBase):
|
||||
mock_allocator = MagicMock()
|
||||
mock_allocator_class.get_instance.return_value = mock_allocator
|
||||
|
||||
mock_hidden_size = MagicMock()
|
||||
mock_hf_config = MagicMock()
|
||||
mock_hf_config.hidden_size = mock_hidden_size
|
||||
mock_model_config = MagicMock()
|
||||
mock_model_config.hf_config = mock_hf_config
|
||||
mock_vllm_config = MagicMock()
|
||||
mock_vllm_config.model_config = mock_model_config
|
||||
|
||||
mock_model_runner = MagicMock()
|
||||
mock_model_runner.model = MagicMock()
|
||||
|
||||
# Create worker mock
|
||||
with patch.object(NPUWorker, "__init__", lambda x, **kwargs: None):
|
||||
worker = NPUWorker()
|
||||
worker.model_runner = mock_model_runner
|
||||
worker.vllm_config = mock_vllm_config
|
||||
worker._sleep_saved_buffers = {}
|
||||
# Test wake_up method
|
||||
worker.wake_up(tags=["test_tag"])
|
||||
|
||||
Reference in New Issue
Block a user