[KVPOOL]decode save kvcache (#5168)
### What this PR does / why we need it?
kvpool decode save kvcache
now only support mla
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: baxingpiaochong <771405853@qq.com>
Co-authored-by: Chao Lei <leichao139636@163.com>
This commit is contained in:
@@ -257,7 +257,23 @@ python3 -m vllm.entrypoints.openai.api_server \
|
||||
}'
|
||||
```
|
||||
|
||||
#### 2.Start proxy_server.
|
||||
Currently, the key-value pool in PD Disaggregate only stores the kv cache generated by the Prefill node by default. In models using MLA, it is now supported that the Decode node stores the kv cache for use by the Prefill node, enabled by adding `consumer_is_to_put: true` to the AscendStoreConnector. If the Prefill node enables PP, `prefill_pp_size` or `prefill_pp_layer_partition` also needs to be set. Example as follows:
|
||||
|
||||
```
|
||||
{
|
||||
"kv_connector": "AscendStoreConnector",
|
||||
"kv_role": "kv_consumer",
|
||||
"kv_connector_extra_config": {
|
||||
"lookup_rpc_port":"0",
|
||||
"backend": "mooncake"
|
||||
"consumer_is_to_put": true,
|
||||
"prefill_pp_size": 2
|
||||
"prefill_pp_layer_partition": "30,31"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 2、Start proxy_server.
|
||||
|
||||
```
|
||||
python vllm-ascend/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py \
|
||||
|
||||
Reference in New Issue
Block a user