[MOE]move weight transpose to wakeup for RL secnarios (#4626)

### What this PR does / why we need it? In reinforcement learning scenarios, the current inference applies a transpose operation to the weights. For a cleaner architecture, the weight transpose module was moved to wakeup. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: ad32e3e19c Signed-off-by: lhp-deep <liuhaopeng1@huawei.com> Co-authored-by: weijinqian0 <1184188277@qq.com>
2025-12-08 20:34:52 +08:00
parent 58db21f56a
commit b230e7e987
7 changed files with 132 additions and 120 deletions
--- a/.github/workflows/_e2e_test.yaml
+++ b/.github/workflows/_e2e_test.yaml
@@ -205,6 +205,7 @@ jobs:
          pytest -sv tests/e2e/multicard/test_pipeline_parallel.py
          pytest -sv tests/e2e/multicard/test_prefix_caching.py
          pytest -sv tests/e2e/multicard/test_qwen3_moe.py
+          pytest -sv tests/e2e/multicard/test_offline_weight_load.py

  e2e-4-cards:
    name: multicard-4