[Quickfix] update CachedRequestState as NewRequestData changed (#2367)

### What this PR does / why we need it? 1. update `CachedRequestState` as `NewRequestData` changed in https://github.com/vllm-project/vllm/pull/22570 2. drop maintenance of vllm v0.10.0 in the branch main ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: 92ff41abea --------- Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-15 07:35:27 +08:00
parent 2ad7e1251e
commit 61866b8ac6
18 changed files with 77 additions and 285 deletions
--- a/vllm_ascend/multistream/ms_split.py
+++ b/vllm_ascend/multistream/ms_split.py
@@ -105,7 +105,7 @@ def model_input_split_v1_mla_attn(
    [block_table_pre,
     block_table_post] = split_attn_tensor_type(attn_metadata.block_tables,
                                                seq_index)
-
+    assert attn_metadata.attn_mask is not None
    if attn_metadata.attn_state == AscendAttentionState.PrefillNoCache or attn_metadata.attn_state == AscendAttentionState.PrefillCacheHit:
        # the attn_mla kernel in torch npu only accept 128*128 attn mask
        attn_mask_pre = attn_mask_post = attn_metadata.attn_mask