[Bugfix] fix pcp + eplb error (#5561)
### What this PR does / why we need it?
Fix the bug in the PCP overlay feature
1、Fix the bug related to PCP and EPLB overlap by including PCP size in
the word_size calculation.
2、In the PCP pooling scenario, a prompt has been added for setting the
cp_kv_cache_interleave_size.
- vLLM version: v0.13.0
- vLLM main:
7157596103
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
This commit is contained in:
@@ -9,6 +9,7 @@ env_common:
|
||||
HCCL_BUFFSIZE: 1024
|
||||
SERVER_PORT: 8080
|
||||
NUMEXPR_MAX_THREADS: 128
|
||||
DYNAMIC_EPLB: true
|
||||
disaggregated_prefill:
|
||||
enabled: true
|
||||
prefiller_host_index: [0]
|
||||
@@ -52,6 +53,8 @@ deployment:
|
||||
}
|
||||
}
|
||||
}'
|
||||
--additional-config
|
||||
'{"dynamic_eplb":true}'
|
||||
|
||||
-
|
||||
server_cmd: >
|
||||
@@ -90,4 +93,6 @@ deployment:
|
||||
}
|
||||
}
|
||||
}'
|
||||
--additional-config
|
||||
'{"dynamic_eplb":true}'
|
||||
benchmarks:
|
||||
|
||||
Reference in New Issue
Block a user