diff --git a/docs/source/user_guide/feature_guide/kv_pool.md b/docs/source/user_guide/feature_guide/kv_pool.md index 0950ce09..6239f758 100644 --- a/docs/source/user_guide/feature_guide/kv_pool.md +++ b/docs/source/user_guide/feature_guide/kv_pool.md @@ -99,6 +99,11 @@ export PYTHONHASHSEED=0 | 800 I/T A3 series | 25.5.0<=HDK<26.0.0 | `export ASCEND_BUFFER_POOL=4:8` | Configures the number and size of buffers on the NPU Device for aggregation and KV transfer (e.g., `4:8` means 4 buffers of 8MB). | | 800 I/T A2 series | N/A | `export HCCL_INTRA_ROCE_ENABLE=1` | Required by direct transmission cheme on 800 I/T A2 series| +### FAQ for HIXL (ascend_direct) backend + +For common troubleshooting and issue localization guidance for HIXL (ascend_direct), see: + + ### Run Mooncake Master #### 1.Configure mooncake.json @@ -126,10 +131,11 @@ The environment variable **MOONCAKE_CONFIG_PATH** is configured to the full path Under the mooncake folder: ```shell -mooncake_master --port 50088 --eviction_high_watermark_ratio 0.9 --eviction_ratio 0.1 +mooncake_master --port 50088 --eviction_high_watermark_ratio 0.9 --eviction_ratio 0.1 --default_kv_lease_ttl 11000 ``` `eviction_high_watermark_ratio` determines the watermark where Mooncake Store will perform eviction,and `eviction_ratio` determines the portion of stored objects that would be evicted. +`default_kv_lease_ttl` controls the default lease TTL for KV objects (milliseconds); configure it via `--default_kv_lease_ttl` and keep it larger than `ASCEND_CONNECT_TIMEOUT` and `ASCEND_TRANSFER_TIMEOUT`. ### PD Disaggregation Scenario @@ -157,6 +163,11 @@ export ASCEND_ENABLE_USE_FABRIC_MEM=1 #A2 #export HCCL_INTRA_ROCE_ENABLE=1 +#Minimum retransmission timeout of the RDMA,equals 4.096 μs * 2 ^ timeout. +#Needs to satisfy the equation: ASCEND_TRANSFER_TIMEOUT > RDMA_TIMEOUT * 7, where 7 is the default number of retry for RDMA transfer. +#HCCL_RDMA_TIMEOUT also affects collective communication behavior and should be configured carefully. +export HCCL_RDMA_TIMEOUT=17 + # Unit: ms. The timeout for one-sided communication connection establishment is set to 10 seconds by default (see PR: https://github.com/kvcache-ai/Mooncake/pull/1039). Users can adjust this value based on their specific setup. # The recommended formula is: ASCEND_CONNECT_TIMEOUT = connection_time_per_card (typically within 500ms) × total_number_of_Decode_cards. # This ensures that even in the worst-case scenario—where all Decode cards simultaneously attempt to connect to the same Prefill card the connection will not time out. @@ -229,6 +240,7 @@ export ACL_OP_INIT_MODE=1 export ASCEND_ENABLE_USE_FABRIC_MEM=1 #A2 #export HCCL_INTRA_ROCE_ENABLE=1 +export HCCL_RDMA_TIMEOUT=17 export ASCEND_CONNECT_TIMEOUT=10000 export ASCEND_TRANSFER_TIMEOUT=10000 @@ -343,6 +355,7 @@ export ACL_OP_INIT_MODE=1 export ASCEND_ENABLE_USE_FABRIC_MEM=1 #A2 #export HCCL_INTRA_ROCE_ENABLE=1 +export HCCL_RDMA_TIMEOUT=17 export ASCEND_CONNECT_TIMEOUT=10000 export ASCEND_TRANSFER_TIMEOUT=10000