### What this PR does / why we need it?
Currently, MHA models (eg: minicpm-2b, Baichuan-7b) will encounter
errors when running in piecewise graph mode, with error messages similar
to:
```
(E89999): When layout is TND and PA not enabled, keyT(8) and valueT(8) must be equal to the last element of actualSeqenceLengthKV(5)[FUNC:CheckInputShapeWhenLayoutIsTND][FILE:prompt_flash_attention_tiling.cpp][LINE:3618]
```
The error occurs because the qkv in the Prefill stage is also padded,
causing the shape to be inconsistent with actual_seq_lengths.
Add unpadding logic for kv.
- vLLM version: release/v0.13.0
- vLLM main:
254f6b9867
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
### What this PR does / why we need it?
When matmul_and_reduce is enabled, the prefix attribute is required.
However, in some models, the prefix is not passed correctly, causing
errors when starting the service.
The issue of incorrect prefix passing will be fixed in vLLM in the
future.
- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: Wang Kunpeng <1289706727@qq.com>