[feat]decode convert bsnd to tnd and fix bug when pcp and dcp (#3980)
### What this PR does / why we need it?
1、in attention_v1 module, convert bsnd t0 tnd when pcp and dcp
2、fix tochair bug: service startup problem
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.11.0
- vLLM main:
83f478bb19
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
This commit is contained in:
@@ -140,8 +140,7 @@ class AscendMLADecodeMetadata:
|
||||
attn_mask: Optional[torch.Tensor] = None
|
||||
sin: torch.Tensor = None
|
||||
cos: torch.Tensor = None
|
||||
num_computed_tokens_of_pcp_dcp: Optional[list[Optional[list[Optional[
|
||||
list[int]]]]]] = None
|
||||
num_computed_tokens_of_pcp_dcp: Optional[list[list[list[int]]]] = None
|
||||
seq_mask_pcp: torch.Tensor = None
|
||||
seq_mask_dcp: torch.Tensor = None
|
||||
cp_seq_len: torch.Tensor = None
|
||||
|
||||
Reference in New Issue
Block a user