[Bugfix] PCP adaptation for VLLM v0.11.2 modifications (#4604)
To adapt to the vLLM v0.11.2 image, the method for obtaining PCP size and DCP size has been modified. ___ - vLLM version: v0.11.2 --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This commit is contained in:
@@ -29,8 +29,10 @@ class KVPoolScheduler:
|
||||
"load_async", False)
|
||||
# request_id -> (vllm cached tokes, kvpool cached tokens)
|
||||
self.load_specs: dict[str, LoadSpec] = {}
|
||||
self.pcp_size = vllm_config.parallel_config.prefill_context_parallel_size
|
||||
self.dcp_size = vllm_config.parallel_config.decode_context_parallel_size
|
||||
self.pcp_size = getattr(vllm_config.parallel_config,
|
||||
"prefill_context_parallel_size", 1)
|
||||
self.dcp_size = getattr(vllm_config.parallel_config,
|
||||
"decode_context_parallel_size", 1)
|
||||
|
||||
self._block_size = vllm_config.cache_config.block_size
|
||||
if self.pcp_size > 1:
|
||||
|
||||
Reference in New Issue
Block a user