[main2main] upgrade vllm to 0308 (#7213)
### What this PR does / why we need it?
Update main2main to vllm 0308.
breaks:
* https://github.com/vllm-project/vllm/pull/30681
* https://github.com/vllm-project/vllm/pull/35552 remove
self.cudagraph_batch_sizes
* https://github.com/vllm-project/vllm/pull/35158 clear_metadata ->
defer_finalize
* https://github.com/vllm-project/vllm/pull/36006 remove
CacheConfig.cpu_offload_gb
* https://github.com/vllm-project/vllm/pull/35472
* https://github.com/vllm-project/vllm/pull/34552 attn_metadata_builder
* https://github.com/vllm-project/vllm/pull/30515 profile_seq_lens
* https://github.com/vllm-project/vllm/pull/28053
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: menogrey <1299267905@qq.com>
Co-authored-by: MrZ20 <2609716663@qq.com>
This commit is contained in:
@@ -152,6 +152,7 @@ class NPUModelRunner310(NPUModelRunner):
|
||||
remove_lora: bool = True,
|
||||
is_graph_capturing: bool = False,
|
||||
num_active_loras: int = 0,
|
||||
profile_seq_lens: int | None = None,
|
||||
):
|
||||
temporary_context = self.temporary_modify_uniform_decode_query_len() if uniform_decode else nullcontext()
|
||||
with temporary_context:
|
||||
@@ -168,6 +169,7 @@ class NPUModelRunner310(NPUModelRunner):
|
||||
remove_lora=remove_lora,
|
||||
is_graph_capturing=is_graph_capturing,
|
||||
num_active_loras=num_active_loras,
|
||||
profile_seq_lens=profile_seq_lens,
|
||||
)
|
||||
|
||||
def _check_and_update_cudagraph_mode(
|
||||
|
||||
Reference in New Issue
Block a user