[MOE]move weight transpose to wakeup for RL secnarios (#4626)
### What this PR does / why we need it?
In reinforcement learning scenarios, the current inference applies a
transpose operation to the weights. For a cleaner architecture, the
weight transpose module was moved to wakeup.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: lhp-deep <liuhaopeng1@huawei.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
This commit is contained in:
@@ -176,9 +176,28 @@ class NPUWorker(WorkerBase):
|
||||
allocator = CaMemAllocator.get_instance()
|
||||
allocator.wake_up(tags=tags)
|
||||
|
||||
hidden_size = self.vllm_config.model_config.hf_config.hidden_size
|
||||
model = self.model_runner.model
|
||||
for name, param in model.named_parameters():
|
||||
if 'w2_weight' in name and param.shape[2] == hidden_size:
|
||||
parts = name.split('.')
|
||||
param_name = parts[-1]
|
||||
parent_module = model.get_submodule(".".join(parts[:-1]))
|
||||
|
||||
w2_data = param.transpose(1, 2)
|
||||
w2_data = torch.nn.Parameter(w2_data, requires_grad=False)
|
||||
setattr(parent_module, param_name, w2_data)
|
||||
elif 'w13_weight' in name and param.shape[1] == hidden_size:
|
||||
parts = name.split('.')
|
||||
param_name = parts[-1]
|
||||
parent_module = model.get_submodule(".".join(parts[:-1]))
|
||||
|
||||
w13_data = param.transpose(1, 2)
|
||||
w13_data = torch.nn.Parameter(w13_data, requires_grad=False)
|
||||
setattr(parent_module, param_name, w13_data)
|
||||
|
||||
# Restore the buffers after level 2 sleep
|
||||
if len(self._sleep_saved_buffers):
|
||||
model = self.model_runner.model
|
||||
for name, buffer in model.named_buffers():
|
||||
if name in self._sleep_saved_buffers:
|
||||
buffer.data.copy_(self._sleep_saved_buffers[name].data)
|
||||
|
||||
Reference in New Issue
Block a user