- Fixed the computing of final hidden_states when enabling pipeline
parallel and prefill context parallel at the same time. Only in the last
PP rank, hidden_states are required and have right tensor type.
- Fixed the shape of intermediate_tensors in the dummy_run when enabling
pipeline parallel and flashcomm1. The intermediate_tensors should be
divided by tp_size. Otherwise, the moe will raise issues.
- Fixed the shape of self.intermediate_tensors for sufficient slice
space
- vLLM version: release/v0.13.0
- vLLM main:
81786c8774
---------
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>