diff --git a/docs/source/tutorials/pd_disaggregation_mooncake_multi_node.md b/docs/source/tutorials/pd_disaggregation_mooncake_multi_node.md index b77a9520..089c0820 100644 --- a/docs/source/tutorials/pd_disaggregation_mooncake_multi_node.md +++ b/docs/source/tutorials/pd_disaggregation_mooncake_multi_node.md @@ -151,7 +151,8 @@ docker run --rm \ ## Install Mooncake -Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. First, we need to obtain the Mooncake project. Refer to the following command: +Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.Installation and Compilation Guide: https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries. +First, we need to obtain the Mooncake project. Refer to the following command: ```shell git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git @@ -186,6 +187,17 @@ make -j make install ``` +Set environment variables + +**Note:** + +- Adjust the Python path according to your specific Python installation +- Ensure `/usr/local/lib` and `/usr/local/lib64` are in your `LD_LIBRARY_PATH` + +```shell +export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/mooncake:$LD_LIBRARY_PATH +``` + ## Prefiller/Decoder Deployment We can run the following scripts to launch a server on the prefiller/decoder node, respectively. Please note that each P/D node will occupy ports ranging from kv_port to kv_port + num_chips to initialize socket listeners. To avoid any issues, port conflicts should be prevented. Additionally, ensure that each node's engine_id is uniquely assigned to avoid conflicts. @@ -195,8 +207,8 @@ Use `launch_online_dp.py` to launch external dp vllm servers. [launch\_online\_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py) ### run_dp_template.sh -Modify `run_dp_template.py` on each node. -[run\_dp\_template.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh) +Modify `run_dp_template.sh` on each node. +[run\_dp\_template.sh](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh) #### Layerwise @@ -221,7 +233,6 @@ export TASK_QUEUE_ENABLE=1 export HCCL_OP_EXPANSION_MODE="AIV" export VLLM_USE_V1=1 export ASCEND_RT_VISIBLE_DEVICES=$1 -export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \ --host 0.0.0.0 \ --port $2 \ @@ -282,7 +293,6 @@ export TASK_QUEUE_ENABLE=1 export HCCL_OP_EXPANSION_MODE="AIV" export VLLM_USE_V1=1 export ASCEND_RT_VISIBLE_DEVICES=$1 -export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \ --host 0.0.0.0 \ --port $2 \ @@ -344,7 +354,6 @@ export TASK_QUEUE_ENABLE=1 export HCCL_OP_EXPANSION_MODE="AIV" export VLLM_USE_V1=1 export ASCEND_RT_VISIBLE_DEVICES=$1 -export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \ --host 0.0.0.0 \ --port $2 \ @@ -405,7 +414,6 @@ export TASK_QUEUE_ENABLE=1 export HCCL_OP_EXPANSION_MODE="AIV" export VLLM_USE_V1=1 export ASCEND_RT_VISIBLE_DEVICES=$1 -export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \ --host 0.0.0.0 \ --port $2 \ @@ -474,7 +482,6 @@ export TASK_QUEUE_ENABLE=1 export HCCL_OP_EXPANSION_MODE="AIV" export VLLM_USE_V1=1 export ASCEND_RT_VISIBLE_DEVICES=$1 -export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \ --host 0.0.0.0 \ --port $2 \ @@ -535,7 +542,6 @@ export TASK_QUEUE_ENABLE=1 export HCCL_OP_EXPANSION_MODE="AIV" export VLLM_USE_V1=1 export ASCEND_RT_VISIBLE_DEVICES=$1 -export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \ --host 0.0.0.0 \ --port $2 \ @@ -597,7 +603,6 @@ export TASK_QUEUE_ENABLE=1 export HCCL_OP_EXPANSION_MODE="AIV" export VLLM_USE_V1=1 export ASCEND_RT_VISIBLE_DEVICES=$1 -export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \ --host 0.0.0.0 \ --port $2 \ @@ -658,7 +663,6 @@ export TASK_QUEUE_ENABLE=1 export HCCL_OP_EXPANSION_MODE="AIV" export VLLM_USE_V1=1 export ASCEND_RT_VISIBLE_DEVICES=$1 -export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \ --host 0.0.0.0 \ --port $2 \ diff --git a/docs/source/tutorials/pd_disaggregation_mooncake_single_node.md b/docs/source/tutorials/pd_disaggregation_mooncake_single_node.md index faee1d52..553fb7d0 100644 --- a/docs/source/tutorials/pd_disaggregation_mooncake_single_node.md +++ b/docs/source/tutorials/pd_disaggregation_mooncake_single_node.md @@ -79,7 +79,8 @@ docker run --rm \ ## Install Mooncake -Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. First, we need to obtain the Mooncake project. Refer to the following command: +Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.Installation and Compilation Guide: https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries. +First, we need to obtain the Mooncake project. Refer to the following command: ```shell git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git @@ -114,6 +115,17 @@ make -j make install ``` +Set environment variables + +**Note:** + +- Adjust the Python path according to your specific Python installation +- Ensure `/usr/local/lib` and `/usr/local/lib64` are in your `LD_LIBRARY_PATH` + +```shell +export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/mooncake:$LD_LIBRARY_PATH +``` + ## Prefiller/Decoder Deployment We can run the following scripts to launch a server on the prefiller/decoder NPU, respectively. diff --git a/docs/source/user_guide/feature_guide/kv_pool.md b/docs/source/user_guide/feature_guide/kv_pool.md index 2a894c54..5f956f5e 100644 --- a/docs/source/user_guide/feature_guide/kv_pool.md +++ b/docs/source/user_guide/feature_guide/kv_pool.md @@ -17,17 +17,63 @@ ## Example of using Mooncake as a KVCache pooling backend * Software: - * Mooncake:main branch + * Check NPU network configuration: - Installation and Compilation Guide:https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries + Ensure that the hccn.conf file exists in the environment. If using Docker, mount it into the container. - Make sure to build with `-DUSE_ASCEND_DIRECT` to enable ADXL engine. + ```bash + cat /etc/hccn.conf + ``` - An example command for compiling ADXL: + * Install Mooncake - `rm -rf build && mkdir -p build && cd build \ && cmake .. -DCMAKE_INSTALL_PREFIX=/opt/transfer-engine/ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DUSE_ASCEND_DIRECT=ON -DBUILD_SHARED_LIBS=ON -DBUILD_UNIT_TESTS=OFF \ && make -j \ && make install` + Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. + Installation and Compilation Guide: https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries. + First, we need to obtain the Mooncake project. Refer to the following command: - Also, you need to set environment variables to point to them `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64/python3.11/site-packages/mooncake`, or copy the .so files to the `/usr/local/lib64` directory after compilation + ```shell + git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git + ``` + + (Optional) Replace go install url if the network is poor + + ```shell + cd Mooncake + sed -i 's|https://go.dev/dl/|https://golang.google.cn/dl/|g' dependencies.sh + ``` + + Install mpi + + ```shell + apt-get install mpich libmpich-dev -y + ``` + + Install the relevant dependencies. The installation of Go is not required. + + ```shell + bash dependencies.sh -y + ``` + + Compile and install + + ```shell + mkdir build + cd build + cmake .. -DUSE_ASCEND_DIRECT=ON + make -j + make install + ``` + + Set environment variables + + **Note:** + + - Adjust the Python path according to your specific Python installation + - Ensure `/usr/local/lib` and `/usr/local/lib64` are in your `LD_LIBRARY_PATH` + + ```shell + export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/mooncake:$LD_LIBRARY_PATH + ``` ### Run Mooncake Master