Add a Mooncake installation tutorial for kv pool and update Mooncake installation tutorial (#5069)
### What this PR does / why we need it?
Add a Mooncake installation tutorial for kv pool and update Mooncake
installation tutorial
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
@@ -151,7 +151,8 @@ docker run --rm \
|
|||||||
|
|
||||||
## Install Mooncake
|
## Install Mooncake
|
||||||
|
|
||||||
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. First, we need to obtain the Mooncake project. Refer to the following command:
|
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.Installation and Compilation Guide: https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries.
|
||||||
|
First, we need to obtain the Mooncake project. Refer to the following command:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git
|
git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git
|
||||||
@@ -186,6 +187,17 @@ make -j
|
|||||||
make install
|
make install
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Set environment variables
|
||||||
|
|
||||||
|
**Note:**
|
||||||
|
|
||||||
|
- Adjust the Python path according to your specific Python installation
|
||||||
|
- Ensure `/usr/local/lib` and `/usr/local/lib64` are in your `LD_LIBRARY_PATH`
|
||||||
|
|
||||||
|
```shell
|
||||||
|
export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/mooncake:$LD_LIBRARY_PATH
|
||||||
|
```
|
||||||
|
|
||||||
## Prefiller/Decoder Deployment
|
## Prefiller/Decoder Deployment
|
||||||
|
|
||||||
We can run the following scripts to launch a server on the prefiller/decoder node, respectively. Please note that each P/D node will occupy ports ranging from kv_port to kv_port + num_chips to initialize socket listeners. To avoid any issues, port conflicts should be prevented. Additionally, ensure that each node's engine_id is uniquely assigned to avoid conflicts.
|
We can run the following scripts to launch a server on the prefiller/decoder node, respectively. Please note that each P/D node will occupy ports ranging from kv_port to kv_port + num_chips to initialize socket listeners. To avoid any issues, port conflicts should be prevented. Additionally, ensure that each node's engine_id is uniquely assigned to avoid conflicts.
|
||||||
@@ -195,8 +207,8 @@ Use `launch_online_dp.py` to launch external dp vllm servers.
|
|||||||
[launch\_online\_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)
|
[launch\_online\_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)
|
||||||
|
|
||||||
### run_dp_template.sh
|
### run_dp_template.sh
|
||||||
Modify `run_dp_template.py` on each node.
|
Modify `run_dp_template.sh` on each node.
|
||||||
[run\_dp\_template.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)
|
[run\_dp\_template.sh](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)
|
||||||
|
|
||||||
#### Layerwise
|
#### Layerwise
|
||||||
|
|
||||||
@@ -221,7 +233,6 @@ export TASK_QUEUE_ENABLE=1
|
|||||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||||
export VLLM_USE_V1=1
|
export VLLM_USE_V1=1
|
||||||
export ASCEND_RT_VISIBLE_DEVICES=$1
|
export ASCEND_RT_VISIBLE_DEVICES=$1
|
||||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
|
|
||||||
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
||||||
--host 0.0.0.0 \
|
--host 0.0.0.0 \
|
||||||
--port $2 \
|
--port $2 \
|
||||||
@@ -282,7 +293,6 @@ export TASK_QUEUE_ENABLE=1
|
|||||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||||
export VLLM_USE_V1=1
|
export VLLM_USE_V1=1
|
||||||
export ASCEND_RT_VISIBLE_DEVICES=$1
|
export ASCEND_RT_VISIBLE_DEVICES=$1
|
||||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
|
|
||||||
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
||||||
--host 0.0.0.0 \
|
--host 0.0.0.0 \
|
||||||
--port $2 \
|
--port $2 \
|
||||||
@@ -344,7 +354,6 @@ export TASK_QUEUE_ENABLE=1
|
|||||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||||
export VLLM_USE_V1=1
|
export VLLM_USE_V1=1
|
||||||
export ASCEND_RT_VISIBLE_DEVICES=$1
|
export ASCEND_RT_VISIBLE_DEVICES=$1
|
||||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
|
|
||||||
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
||||||
--host 0.0.0.0 \
|
--host 0.0.0.0 \
|
||||||
--port $2 \
|
--port $2 \
|
||||||
@@ -405,7 +414,6 @@ export TASK_QUEUE_ENABLE=1
|
|||||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||||
export VLLM_USE_V1=1
|
export VLLM_USE_V1=1
|
||||||
export ASCEND_RT_VISIBLE_DEVICES=$1
|
export ASCEND_RT_VISIBLE_DEVICES=$1
|
||||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
|
|
||||||
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
||||||
--host 0.0.0.0 \
|
--host 0.0.0.0 \
|
||||||
--port $2 \
|
--port $2 \
|
||||||
@@ -474,7 +482,6 @@ export TASK_QUEUE_ENABLE=1
|
|||||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||||
export VLLM_USE_V1=1
|
export VLLM_USE_V1=1
|
||||||
export ASCEND_RT_VISIBLE_DEVICES=$1
|
export ASCEND_RT_VISIBLE_DEVICES=$1
|
||||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
|
|
||||||
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
||||||
--host 0.0.0.0 \
|
--host 0.0.0.0 \
|
||||||
--port $2 \
|
--port $2 \
|
||||||
@@ -535,7 +542,6 @@ export TASK_QUEUE_ENABLE=1
|
|||||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||||
export VLLM_USE_V1=1
|
export VLLM_USE_V1=1
|
||||||
export ASCEND_RT_VISIBLE_DEVICES=$1
|
export ASCEND_RT_VISIBLE_DEVICES=$1
|
||||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
|
|
||||||
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
||||||
--host 0.0.0.0 \
|
--host 0.0.0.0 \
|
||||||
--port $2 \
|
--port $2 \
|
||||||
@@ -597,7 +603,6 @@ export TASK_QUEUE_ENABLE=1
|
|||||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||||
export VLLM_USE_V1=1
|
export VLLM_USE_V1=1
|
||||||
export ASCEND_RT_VISIBLE_DEVICES=$1
|
export ASCEND_RT_VISIBLE_DEVICES=$1
|
||||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
|
|
||||||
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
||||||
--host 0.0.0.0 \
|
--host 0.0.0.0 \
|
||||||
--port $2 \
|
--port $2 \
|
||||||
@@ -658,7 +663,6 @@ export TASK_QUEUE_ENABLE=1
|
|||||||
export HCCL_OP_EXPANSION_MODE="AIV"
|
export HCCL_OP_EXPANSION_MODE="AIV"
|
||||||
export VLLM_USE_V1=1
|
export VLLM_USE_V1=1
|
||||||
export ASCEND_RT_VISIBLE_DEVICES=$1
|
export ASCEND_RT_VISIBLE_DEVICES=$1
|
||||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/mooncake:$LD_LIBRARY_PATH
|
|
||||||
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
vllm serve /path_to_weight/DeepSeek-r1_w8a8_mtp \
|
||||||
--host 0.0.0.0 \
|
--host 0.0.0.0 \
|
||||||
--port $2 \
|
--port $2 \
|
||||||
|
|||||||
@@ -79,7 +79,8 @@ docker run --rm \
|
|||||||
|
|
||||||
## Install Mooncake
|
## Install Mooncake
|
||||||
|
|
||||||
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. First, we need to obtain the Mooncake project. Refer to the following command:
|
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.Installation and Compilation Guide: https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries.
|
||||||
|
First, we need to obtain the Mooncake project. Refer to the following command:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git
|
git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git
|
||||||
@@ -114,6 +115,17 @@ make -j
|
|||||||
make install
|
make install
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Set environment variables
|
||||||
|
|
||||||
|
**Note:**
|
||||||
|
|
||||||
|
- Adjust the Python path according to your specific Python installation
|
||||||
|
- Ensure `/usr/local/lib` and `/usr/local/lib64` are in your `LD_LIBRARY_PATH`
|
||||||
|
|
||||||
|
```shell
|
||||||
|
export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/mooncake:$LD_LIBRARY_PATH
|
||||||
|
```
|
||||||
|
|
||||||
## Prefiller/Decoder Deployment
|
## Prefiller/Decoder Deployment
|
||||||
|
|
||||||
We can run the following scripts to launch a server on the prefiller/decoder NPU, respectively.
|
We can run the following scripts to launch a server on the prefiller/decoder NPU, respectively.
|
||||||
|
|||||||
@@ -17,17 +17,63 @@
|
|||||||
|
|
||||||
## Example of using Mooncake as a KVCache pooling backend
|
## Example of using Mooncake as a KVCache pooling backend
|
||||||
* Software:
|
* Software:
|
||||||
* Mooncake:main branch
|
* Check NPU network configuration:
|
||||||
|
|
||||||
Installation and Compilation Guide:https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries
|
Ensure that the hccn.conf file exists in the environment. If using Docker, mount it into the container.
|
||||||
|
|
||||||
Make sure to build with `-DUSE_ASCEND_DIRECT` to enable ADXL engine.
|
```bash
|
||||||
|
cat /etc/hccn.conf
|
||||||
|
```
|
||||||
|
|
||||||
An example command for compiling ADXL:
|
* Install Mooncake
|
||||||
|
|
||||||
`rm -rf build && mkdir -p build && cd build \ && cmake .. -DCMAKE_INSTALL_PREFIX=/opt/transfer-engine/ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DUSE_ASCEND_DIRECT=ON -DBUILD_SHARED_LIBS=ON -DBUILD_UNIT_TESTS=OFF \ && make -j \ && make install`
|
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
|
||||||
|
Installation and Compilation Guide: https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries.
|
||||||
|
First, we need to obtain the Mooncake project. Refer to the following command:
|
||||||
|
|
||||||
Also, you need to set environment variables to point to them `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64/python3.11/site-packages/mooncake`, or copy the .so files to the `/usr/local/lib64` directory after compilation
|
```shell
|
||||||
|
git clone -b v0.3.7.post2 --depth 1 https://github.com/kvcache-ai/Mooncake.git
|
||||||
|
```
|
||||||
|
|
||||||
|
(Optional) Replace go install url if the network is poor
|
||||||
|
|
||||||
|
```shell
|
||||||
|
cd Mooncake
|
||||||
|
sed -i 's|https://go.dev/dl/|https://golang.google.cn/dl/|g' dependencies.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Install mpi
|
||||||
|
|
||||||
|
```shell
|
||||||
|
apt-get install mpich libmpich-dev -y
|
||||||
|
```
|
||||||
|
|
||||||
|
Install the relevant dependencies. The installation of Go is not required.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
bash dependencies.sh -y
|
||||||
|
```
|
||||||
|
|
||||||
|
Compile and install
|
||||||
|
|
||||||
|
```shell
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
cmake .. -DUSE_ASCEND_DIRECT=ON
|
||||||
|
make -j
|
||||||
|
make install
|
||||||
|
```
|
||||||
|
|
||||||
|
Set environment variables
|
||||||
|
|
||||||
|
**Note:**
|
||||||
|
|
||||||
|
- Adjust the Python path according to your specific Python installation
|
||||||
|
- Ensure `/usr/local/lib` and `/usr/local/lib64` are in your `LD_LIBRARY_PATH`
|
||||||
|
|
||||||
|
```shell
|
||||||
|
export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/mooncake:$LD_LIBRARY_PATH
|
||||||
|
```
|
||||||
|
|
||||||
### Run Mooncake Master
|
### Run Mooncake Master
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user