[MOE]move weight transpose to wakeup for RL secnarios (#4626)

### What this PR does / why we need it?
In reinforcement learning scenarios, the current inference applies a
transpose operation to the weights. For a cleaner architecture, the
weight transpose module was moved to wakeup.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: lhp-deep <liuhaopeng1@huawei.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
This commit is contained in:
lhp-deep
2025-12-08 20:34:52 +08:00
committed by GitHub
parent 58db21f56a
commit b230e7e987
7 changed files with 132 additions and 120 deletions

View File

@@ -205,6 +205,7 @@ jobs:
pytest -sv tests/e2e/multicard/test_pipeline_parallel.py
pytest -sv tests/e2e/multicard/test_prefix_caching.py
pytest -sv tests/e2e/multicard/test_qwen3_moe.py
pytest -sv tests/e2e/multicard/test_offline_weight_load.py
e2e-4-cards:
name: multicard-4