[Bugfix] PCP adaptation for VLLM v0.11.2 modifications (#4604)
To adapt to the vLLM v0.11.2 image, the method for obtaining PCP size and DCP size has been modified. ___ - vLLM version: v0.11.2 --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This commit is contained in:
@@ -29,8 +29,10 @@ class KVPoolScheduler:
|
|||||||
"load_async", False)
|
"load_async", False)
|
||||||
# request_id -> (vllm cached tokes, kvpool cached tokens)
|
# request_id -> (vllm cached tokes, kvpool cached tokens)
|
||||||
self.load_specs: dict[str, LoadSpec] = {}
|
self.load_specs: dict[str, LoadSpec] = {}
|
||||||
self.pcp_size = vllm_config.parallel_config.prefill_context_parallel_size
|
self.pcp_size = getattr(vllm_config.parallel_config,
|
||||||
self.dcp_size = vllm_config.parallel_config.decode_context_parallel_size
|
"prefill_context_parallel_size", 1)
|
||||||
|
self.dcp_size = getattr(vllm_config.parallel_config,
|
||||||
|
"decode_context_parallel_size", 1)
|
||||||
|
|
||||||
self._block_size = vllm_config.cache_config.block_size
|
self._block_size = vllm_config.cache_config.block_size
|
||||||
if self.pcp_size > 1:
|
if self.pcp_size > 1:
|
||||||
|
|||||||
Reference in New Issue
Block a user