### What this PR does / why we need it?
This patch fix the nightly failure
1. Each case uses a copy of the global kwargs instead of a reference to
prevent parameter pollution between use cases.
2. Add weight initialization in the scenario of `eplb` + `w8a8_dynamic`
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
```python
pytest -sv tests/e2e/nightly/single_node/ops/multicard_ops_a3/test_dispatch_gmm_combine_decode.py
```
```shell
===================================================================== 3 passed, 4 warnings in 194.86s (0:03:14) ======================================================================
```
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
Signed-off-by: wangli <wangli858794774@gmail.com>