Files
sglang/python/sglang/srt/mem_cache/mooncake_store/README.md
huangtingwei d904959233 Support l3 cache (mooncake store) for hiradix cache (#7211)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
Co-authored-by: zuoyuan <zhangzuo21@mails.tsinghua.edu.cn>
Co-authored-by: @wangyueneng.wyn <wangyueneng.wyn@antgroup.com>
Co-authored-by: JinYan Su <jinyansu792@gmail.com>
2025-07-30 23:15:51 -07:00

1.4 KiB

Mooncake as L3 KV Cache

This document describes how to use Mooncake as the L3 KV cache for SGLang. For more details about Mooncake, please refer to: https://kvcache-ai.github.io/

Install Mooncake

Method 1: with pip

pip install mooncake-transfer-engine

Method 2: from source

Clone Mooncake project:

git clone https://github.com/kvcache-ai/Mooncake --recursive

Install dependencies:

cd Mooncake
bash dependencies.sh

Build the project. For additional build options, please refer to the official guide.

mkdir build
cd build
cmake ..
make -j

Install Mooncake:

sudo make install

Use Mooncake

Launch Mooncake master server:

mooncake_master

Launch Mooncake meta server:

python -m mooncake.http_metadata_server

Start the SGLang server with Mooncake enabled. Mooncake configuration can be provided via environment variables:

MOONCAKE_TE_META_DATA_SERVER="http://127.0.0.1:8080/metadata" \
MOONCAKE_GLOBAL_SEGMENT_SIZE=4294967296 \
MOONCAKE_LOCAL_BUFFER_SIZE=134217728 \
MOONCAKE_PROTOCOL="rdma" \
MOONCAKE_DEVICE="erdma_0,erdma_1" \
MOONCAKE_MASTER=127.0.0.1:50051 \
python -m sglang.launch_server \
    --enable-hierarchical-cache \
    --hicache-storage-backend mooncake\
    --model-path [model_path]