[main2main] upgrade vllm to 0308 (#7213)

### What this PR does / why we need it? Update main2main to vllm 0308. breaks: * https://github.com/vllm-project/vllm/pull/30681 * https://github.com/vllm-project/vllm/pull/35552 remove self.cudagraph_batch_sizes * https://github.com/vllm-project/vllm/pull/35158 clear_metadata -> defer_finalize * https://github.com/vllm-project/vllm/pull/36006 remove CacheConfig.cpu_offload_gb * https://github.com/vllm-project/vllm/pull/35472 * https://github.com/vllm-project/vllm/pull/34552 attn_metadata_builder * https://github.com/vllm-project/vllm/pull/30515 profile_seq_lens * https://github.com/vllm-project/vllm/pull/28053 - vLLM version: v0.16.0 - vLLM main: 4034c3d32e --------- Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: menogrey <1299267905@qq.com> Co-authored-by: MrZ20 <2609716663@qq.com>
2026-03-18 09:24:43 +08:00
parent 79ef41a53d
commit 1c954ff264
16 changed files with 223 additions and 168 deletions
--- a/vllm_ascend/_310p/model_runner_310p.py
+++ b/vllm_ascend/_310p/model_runner_310p.py
@@ -152,6 +152,7 @@ class NPUModelRunner310(NPUModelRunner):
        remove_lora: bool = True,
        is_graph_capturing: bool = False,
        num_active_loras: int = 0,
+        profile_seq_lens: int | None = None,
    ):
        temporary_context = self.temporary_modify_uniform_decode_query_len() if uniform_decode else nullcontext()
        with temporary_context:
@@ -168,6 +169,7 @@ class NPUModelRunner310(NPUModelRunner):
                remove_lora=remove_lora,
                is_graph_capturing=is_graph_capturing,
                num_active_loras=num_active_loras,
+                profile_seq_lens=profile_seq_lens,
            )

    def _check_and_update_cudagraph_mode(