### What this PR does / why we need it?
Fix dp+ep+tp inplace copy error when sp chunked the `hidden_states`.
### How was this patch tested?
test locally with the following scripts
```bash
python examples/offline_data_parallel.py \
--model="Qwen/Qwen3-30B-A3B" \
--dp-size=2 \
--tp-size=2 \
--enable-expert-parallel
```
Signed-off-by: MengqingCao <cmq0113@163.com>