[docker] Distributed Serving with k8s Statefulset ( good example for DeepSeek-R1) (#3631)

Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Co-authored-by: Kebe <kebe.liu@daocloud.io>
This commit is contained in:
Peter Pan
2025-03-09 15:41:20 +08:00
committed by GitHub
parent 1361ab9e03
commit 0e90ae628a
2 changed files with 119 additions and 1 deletions

View File

@@ -98,7 +98,21 @@ drun v0.4.3.post4-rocm630 python3 -m sglang.bench_one_batch --batch-size 32 --in
2. Execute the command `docker compose up -d` in your terminal.
</details>
## Method 5: Run on Kubernetes or Clouds with SkyPilot
## Method 5: Using Kubernetes
<details>
<summary>More</summary>
1. Option 1: For single node serving (typically when the model size fits into GPUs on one node)
Execute command `kubectl apply -f docker/k8s-sglang-service.yaml`, to create k8s deployment and service, with llama-31-8b as example.
2. Option 2: For multi-node serving (usually when a large model requires more than one GPU node, such as `DeepSeek-R1`)
Modify the LLM model path and arguments as necessary, then execute command `kubectl apply -f docker/k8s-sglang-distributed-sts.yaml`, to create two nodes k8s statefulset and serving service.
</details>
## Method 6: Run on Kubernetes or Clouds with SkyPilot
<details>
<summary>More</summary>