From b75bfc58f683fb6f28abc19dd41831c8d4c98a1c Mon Sep 17 00:00:00 2001 From: Chao Lei Date: Mon, 15 Dec 2025 14:24:39 +0800 Subject: [PATCH] [Doc ] Supplement kvpool user guide (#5013) ### What this PR does / why we need it? Supplement detailed descriptions for `ASCEND_CONNECT_TIMEOUT` and `ASCEND_TRANSFER_TIMEOUT` in kvpool. - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 --------- Signed-off-by: LCAIZJ --- docs/source/user_guide/feature_guide/kv_pool.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/source/user_guide/feature_guide/kv_pool.md b/docs/source/user_guide/feature_guide/kv_pool.md index 05eba3c8..2a894c54 100644 --- a/docs/source/user_guide/feature_guide/kv_pool.md +++ b/docs/source/user_guide/feature_guide/kv_pool.md @@ -85,9 +85,16 @@ export PYTHONPATH=$PYTHONPATH:/xxxxx/vllm export MOONCAKE_CONFIG_PATH="/xxxxxx/mooncake.json" export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 export ACL_OP_INIT_MODE=1 -export ASCEND_BUFFER_POOL=4:8 + # ASCEND_BUFFER_POOL is the environment variable for configuring the number and size of buffer on NPU Device for aggregation and KV transfer,the value 4:8 means we allocate 4 buffers of size 8MB. +export ASCEND_BUFFER_POOL=4:8 + +# Unit: ms. The timeout for one-sided communication connection establishment is set to 10 seconds by default (see PR: https://github.com/kvcache-ai/Mooncake/pull/1039). Users can adjust this value based on their specific setup. +# The recommended formula is: ASCEND_CONNECT_TIMEOUT = connection_time_per_card (typically within 500ms) × total_number_of_Decode_cards. +# This ensures that even in the worst-case scenario—where all Decode cards simultaneously attempt to connect to the same Prefill card the connection will not time out. export ASCEND_CONNECT_TIMEOUT=10000 + +# Unit: ms. The timeout for one-sided communication transfer is set to 10 seconds by default (see PR: https://github.com/kvcache-ai/Mooncake/pull/1039). export ASCEND_TRANSFER_TIMEOUT=10000 python3 -m vllm.entrypoints.openai.api_server \