Files

huangtingwei d904959233 Support l3 cache (mooncake store) for hiradix cache (#7211 )

Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
Co-authored-by: zuoyuan <zhangzuo21@mails.tsinghua.edu.cn>
Co-authored-by: @wangyueneng.wyn <wangyueneng.wyn@antgroup.com>
Co-authored-by: JinYan Su <jinyansu792@gmail.com>

2025-07-30 23:15:51 -07:00

1.4 KiB

Raw Blame History

Mooncake as L3 KV Cache

This document describes how to use Mooncake as the L3 KV cache for SGLang. For more details about Mooncake, please refer to: https://kvcache-ai.github.io/

Install Mooncake

Method 1: with pip

pip install mooncake-transfer-engine

Method 2: from source

Clone Mooncake project:

git clone https://github.com/kvcache-ai/Mooncake --recursive

Install dependencies:

cd Mooncake
bash dependencies.sh

Build the project. For additional build options, please refer to the official guide.

mkdir build
cd build
cmake ..
make -j

Install Mooncake:

sudo make install

Use Mooncake

Launch Mooncake master server:

mooncake_master

Launch Mooncake meta server:

python -m mooncake.http_metadata_server

Start the SGLang server with Mooncake enabled. Mooncake configuration can be provided via environment variables:

MOONCAKE_TE_META_DATA_SERVER="http://127.0.0.1:8080/metadata" \
MOONCAKE_GLOBAL_SEGMENT_SIZE=4294967296 \
MOONCAKE_LOCAL_BUFFER_SIZE=134217728 \
MOONCAKE_PROTOCOL="rdma" \
MOONCAKE_DEVICE="erdma_0,erdma_1" \
MOONCAKE_MASTER=127.0.0.1:50051 \
python -m sglang.launch_server \
    --enable-hierarchical-cache \
    --hicache-storage-backend mooncake\
    --model-path [model_path]

1.4 KiB Raw Blame History