diff --git a/docs/source/user_guide/feature_guide/kv_pool_mooncake.md b/docs/source/user_guide/feature_guide/kv_pool_mooncake.md index c3693868..bf7108f1 100644 --- a/docs/source/user_guide/feature_guide/kv_pool_mooncake.md +++ b/docs/source/user_guide/feature_guide/kv_pool_mooncake.md @@ -266,12 +266,15 @@ python3 -m vllm.entrypoints.openai.api_server \ "kv_connector": "MooncakeConnectorStoreV1", "kv_role": "kv_both", "kv_connector_extra_config": { + "register_buffer": true, "use_layerwise": false, "mooncake_rpc_port":"0" } }' > mix.log 2>&1 ``` +`register_buffer` is set to `false` by default and need to be set to `true` only in PD-mixed scenario. + ### 2. Run Inference Configure the localhost, port, and model weight path in the command to your own settings. The requests sent will only go to the port where the mixed deployment script is located, and there is no need to start a separate proxy.