From d43022f3ed284cedc1b024c41188bc01de1d9618 Mon Sep 17 00:00:00 2001 From: pz1116 <47019764+Pz1116@users.noreply.github.com> Date: Wed, 19 Nov 2025 15:57:50 +0800 Subject: [PATCH] [doc]fix readme for kv pool user guide (#4271) ### What this PR does / why we need it? Add the parameter "register_buffer" for PD Aggregated Scenario in the given example. - vLLM version: v0.11.0 - vLLM main: https://github.com/vllm-project/vllm/commit/2918c1b49c88c29783c86f78d2c4221cb9622379 Signed-off-by: Pz1116 --- docs/source/user_guide/feature_guide/kv_pool_mooncake.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/user_guide/feature_guide/kv_pool_mooncake.md b/docs/source/user_guide/feature_guide/kv_pool_mooncake.md index c3693868..bf7108f1 100644 --- a/docs/source/user_guide/feature_guide/kv_pool_mooncake.md +++ b/docs/source/user_guide/feature_guide/kv_pool_mooncake.md @@ -266,12 +266,15 @@ python3 -m vllm.entrypoints.openai.api_server \ "kv_connector": "MooncakeConnectorStoreV1", "kv_role": "kv_both", "kv_connector_extra_config": { + "register_buffer": true, "use_layerwise": false, "mooncake_rpc_port":"0" } }' > mix.log 2>&1 ``` +`register_buffer` is set to `false` by default and need to be set to `true` only in PD-mixed scenario. + ### 2. Run Inference Configure the localhost, port, and model weight path in the command to your own settings. The requests sent will only go to the port where the mixed deployment script is located, and there is no need to start a separate proxy.