Add a Mooncake installation tutorial for kv pool and update Mooncake installation tutorial (#5069)

### What this PR does / why we need it? Add a Mooncake installation tutorial for kv pool and update Mooncake installation tutorial - vLLM version: v0.12.0 - vLLM main: ad32e3e19c Signed-off-by: liziyu <liziyu16@huawei.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-12-16 19:53:23 +08:00
parent 4ed2951400
commit 190ae55e9f
3 changed files with 80 additions and 18 deletions
--- a/docs/source/tutorials/pd_disaggregation_mooncake_multi_node.md
+++ b/docs/source/tutorials/pd_disaggregation_mooncake_multi_node.md
@@ -151,7 +151,8 @@ docker run --rm \

 ## Install Mooncake

-Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. First, we need to obtain the Mooncake project. Refer to the following command:
+Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.Installation and Compilation Guide: https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries.
+First, we need to obtain the Mooncake project. Refer to the following command:

 ```shell
 git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git
@@ -186,6 +187,17 @@ make -j
 make install
 ```

+Set environment variables
+
+**Note:**
+
+- Adjust the Python path according to your specific Python installation
+- Ensure `/usr/local/lib` and `/usr/local/lib64` are in your `LD_LIBRARY_PATH`
+
+```shell
+export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/mooncake:$LD_LIBRARY_PATH
+```
+
 ## Prefiller/Decoder Deployment

 We can run the following scripts to launch a server on the prefiller/decoder node, respectively. Please note that each P/D node will occupy ports ranging from kv_port to kv_port + num_chips to initialize socket listeners. To avoid any issues, port conflicts should be prevented. Additionally, ensure that each node's engine_id is uniquely assigned to avoid conflicts.
@@ -195,8 +207,8 @@ Use `launch_online_dp.py` to launch external dp vllm servers.
 [launch\_online\_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)

 ### run_dp_template.sh
-Modify `run_dp_template.py` on each node.
-[run\_dp\_template.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)
+Modify `run_dp_template.sh` on each node.
+[run\_dp\_template.sh](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)

 #### Layerwise

@@ -221,7 +233,6 @@ export TASK_QUEUE_ENABLE=1
 export HCCL_OP_EXPANSION_MODE="AIV"
 export VLLM_USE_V1=1
 export ASCEND_RT_VISIBLE_DEVICES=$1
-export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
 vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
  --host 0.0.0.0 \
  --port $2 \
@@ -282,7 +293,6 @@ export TASK_QUEUE_ENABLE=1
 export HCCL_OP_EXPANSION_MODE="AIV"
 export VLLM_USE_V1=1
 export ASCEND_RT_VISIBLE_DEVICES=$1
-export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
 vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
  --host 0.0.0.0 \
  --port $2 \
@@ -344,7 +354,6 @@ export TASK_QUEUE_ENABLE=1
 export HCCL_OP_EXPANSION_MODE="AIV"
 export VLLM_USE_V1=1
 export ASCEND_RT_VISIBLE_DEVICES=$1
-export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
 vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
  --host 0.0.0.0 \
  --port $2 \
@@ -405,7 +414,6 @@ export TASK_QUEUE_ENABLE=1
 export HCCL_OP_EXPANSION_MODE="AIV"
 export VLLM_USE_V1=1
 export ASCEND_RT_VISIBLE_DEVICES=$1
-export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
 vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
  --host 0.0.0.0 \
  --port $2 \
@@ -474,7 +482,6 @@ export TASK_QUEUE_ENABLE=1
 export HCCL_OP_EXPANSION_MODE="AIV"
 export VLLM_USE_V1=1
 export ASCEND_RT_VISIBLE_DEVICES=$1
-export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
 vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
  --host 0.0.0.0 \
  --port $2 \
@@ -535,7 +542,6 @@ export TASK_QUEUE_ENABLE=1
 export HCCL_OP_EXPANSION_MODE="AIV"
 export VLLM_USE_V1=1
 export ASCEND_RT_VISIBLE_DEVICES=$1
-export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
 vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
  --host 0.0.0.0 \
  --port $2 \
@@ -597,7 +603,6 @@ export TASK_QUEUE_ENABLE=1
 export HCCL_OP_EXPANSION_MODE="AIV"
 export VLLM_USE_V1=1
 export ASCEND_RT_VISIBLE_DEVICES=$1
-export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
 vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
  --host 0.0.0.0 \
  --port $2 \
@@ -658,7 +663,6 @@ export TASK_QUEUE_ENABLE=1
 export HCCL_OP_EXPANSION_MODE="AIV"
 export VLLM_USE_V1=1
 export ASCEND_RT_VISIBLE_DEVICES=$1
-export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
 vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
  --host 0.0.0.0 \
  --port $2 \
--- a/docs/source/tutorials/pd_disaggregation_mooncake_single_node.md
+++ b/docs/source/tutorials/pd_disaggregation_mooncake_single_node.md
@@ -79,7 +79,8 @@ docker run --rm \

 ## Install Mooncake

-Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. First, we need to obtain the Mooncake project. Refer to the following command:
+Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.Installation and Compilation Guide: https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries.
+First, we need to obtain the Mooncake project. Refer to the following command:

 ```shell
 git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git
@@ -114,6 +115,17 @@ make -j
 make install
 ```

+Set environment variables
+
+**Note:**
+
+- Adjust the Python path according to your specific Python installation
+- Ensure `/usr/local/lib` and `/usr/local/lib64` are in your `LD_LIBRARY_PATH`
+
+```shell
+export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/mooncake:$LD_LIBRARY_PATH
+```
+
 ## Prefiller/Decoder Deployment

 We can run the following scripts to launch a server on the prefiller/decoder NPU, respectively.