2024-11-23 11:01:04 -08:00
# SGLang Router
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
SGLang router is a standalone Rust module that enables data parallelism across SGLang instances, providing high-performance request routing and advanced load balancing. The router supports multiple load balancing algorithms including cache-aware, power of two, random, and round robin, and acts as a specialized load balancer for prefill-decode disaggregated serving architectures.
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
## Documentation
2024-11-23 11:01:04 -08:00
2025-07-06 22:54:11 -07:00
- **User Guide**: [docs.sglang.ai/router/router.html ](https://docs.sglang.ai/router/router.html )
2024-11-24 23:17:11 -08:00
2025-07-06 22:54:11 -07:00
## Quick Start
2024-11-24 23:17:11 -08:00
2024-12-11 13:11:42 -08:00
### Prerequisites
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
**Rust and Cargo:**
2024-11-11 12:19:32 -08:00
```bash
# Install rustup (Rust installer and version manager)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Follow the installation prompts, then reload your shell
source $HOME/.cargo/env
# Verify installation
rustc --version
cargo --version
```
2025-07-06 22:54:11 -07:00
**Python with pip installed**
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
### Installation
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
#### Option A: Build and Install Wheel (Recommended)
2024-11-11 12:19:32 -08:00
```bash
2025-07-06 22:54:11 -07:00
# Install build dependencies
pip install setuptools-rust wheel build
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
# Build the wheel package
python -m build
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
# Install the generated wheel
pip install dist/*.whl
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
# One-liner for development (rebuild + install)
python -m build & & pip install --force-reinstall dist/*.whl
2024-12-11 13:11:42 -08:00
```
2025-07-06 22:54:11 -07:00
#### Option B: Development Mode
2024-12-11 13:11:42 -08:00
```bash
2025-07-06 22:54:11 -07:00
pip install -e .
2024-11-11 12:19:32 -08:00
```
2025-07-06 22:54:11 -07:00
⚠️ **Warning** : Editable installs may suffer performance degradation. Use wheel builds for performance testing.
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
### Basic Usage
2024-11-23 08:34:48 -08:00
2024-11-11 12:19:32 -08:00
```bash
2025-07-06 22:54:11 -07:00
# Build Rust components
cargo build
# Launch router with worker URLs
python -m sglang_router.launch_router \
--worker-urls http://worker1:8000 http://worker2:8000
2024-11-11 12:19:32 -08:00
```
2025-07-06 22:54:11 -07:00
## Configuration
2024-11-11 12:19:32 -08:00
2025-04-27 16:54:10 -07:00
### Logging
2025-07-06 22:54:11 -07:00
Enable structured logging with optional file output:
2025-04-27 16:54:10 -07:00
```python
2025-07-06 22:54:11 -07:00
from sglang_router import Router
# Console logging (default)
router = Router(worker_urls=["http://worker1:8000", "http://worker2:8000"])
# File logging enabled
2025-04-27 16:54:10 -07:00
router = Router(
worker_urls=["http://worker1:8000", "http://worker2:8000"],
2025-07-06 22:54:11 -07:00
log_dir="./logs" # Daily log files created here
2025-04-27 16:54:10 -07:00
)
```
2025-07-06 22:54:11 -07:00
Set log level with `--log-level` flag ([documentation ](https://docs.sglang.ai/backend/server_arguments.html#logging )).
2025-04-27 16:54:10 -07:00
2025-05-24 22:28:15 -07:00
### Metrics
2025-07-06 22:54:11 -07:00
Prometheus metrics endpoint available at `127.0.0.1:29000` by default.
2025-05-24 22:28:15 -07:00
2025-07-06 22:54:11 -07:00
```bash
# Custom metrics configuration
2025-05-24 22:28:15 -07:00
python -m sglang_router.launch_router \
2025-07-06 22:54:11 -07:00
--worker-urls http://localhost:8080 http://localhost:8081 \
--prometheus-host 0.0.0.0 \
--prometheus-port 9000
2025-05-24 22:28:15 -07:00
```
2025-07-06 22:54:11 -07:00
## Advanced Features
2025-04-29 10:21:19 -07:00
2025-07-06 22:54:11 -07:00
### Kubernetes Service Discovery
2025-04-29 10:21:19 -07:00
2025-07-06 22:54:11 -07:00
Automatic worker discovery and management in Kubernetes environments.
2025-06-22 17:54:14 -07:00
2025-07-06 22:54:11 -07:00
#### Basic Service Discovery
2025-04-29 10:21:19 -07:00
```bash
python -m sglang_router.launch_router \
--service-discovery \
--selector app=sglang-worker role=inference \
--service-discovery-namespace default
```
2025-07-06 22:54:11 -07:00
#### PD (Prefill-Decode) Mode
2025-06-22 17:54:14 -07:00
2025-07-06 22:54:11 -07:00
For disaggregated prefill/decode routing:
2025-06-22 17:54:14 -07:00
```bash
python -m sglang_router.launch_router \
--pd-disaggregation \
--policy cache_aware \
--service-discovery \
--prefill-selector app=sglang component=prefill \
--decode-selector app=sglang component=decode \
--service-discovery-namespace sglang-system
```
2025-07-06 22:54:11 -07:00
#### Kubernetes Pod Configuration
2025-06-22 17:54:14 -07:00
**Prefill Server Pod:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: sglang-prefill-1
labels:
app: sglang
component: prefill
annotations:
2025-07-06 22:54:11 -07:00
sglang.ai/bootstrap-port: "9001" # Optional: Bootstrap port
2025-06-22 17:54:14 -07:00
spec:
containers:
- name: sglang
image: lmsys/sglang:latest
ports:
- containerPort: 8000 # Main API port
2025-07-06 22:54:11 -07:00
- containerPort: 9001 # Optional: Bootstrap port
2025-06-22 17:54:14 -07:00
```
**Decode Server Pod:**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: sglang-decode-1
labels:
app: sglang
component: decode
spec:
containers:
- name: sglang
image: lmsys/sglang:latest
ports:
2025-07-06 22:54:11 -07:00
- containerPort: 8000
2025-06-22 17:54:14 -07:00
```
2025-07-06 22:54:11 -07:00
#### RBAC Configuration
2025-04-29 10:21:19 -07:00
2025-06-22 17:54:14 -07:00
**Namespace-scoped (recommended):**
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: sglang-router
namespace: sglang-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: sglang-system
name: sglang-router
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: sglang-router
namespace: sglang-system
subjects:
- kind: ServiceAccount
name: sglang-router
namespace: sglang-system
roleRef:
kind: Role
name: sglang-router
apiGroup: rbac.authorization.k8s.io
```
2025-07-06 22:54:11 -07:00
#### Complete PD Example
2025-06-22 17:54:14 -07:00
```bash
python -m sglang_router.launch_router \
--pd-disaggregation \
--policy cache_aware \
--service-discovery \
--prefill-selector app=sglang component=prefill environment=production \
--decode-selector app=sglang component=decode environment=production \
--service-discovery-namespace production \
--host 0.0.0.0 \
--port 8080 \
--prometheus-host 0.0.0.0 \
--prometheus-port 9090
```
2025-07-06 22:54:11 -07:00
### Command Line Arguments Reference
2025-04-29 10:21:19 -07:00
2025-07-06 22:54:11 -07:00
#### Service Discovery
- `--service-discovery` : Enable Kubernetes service discovery
- `--service-discovery-port` : Port for worker URLs (default: 8000)
- `--service-discovery-namespace` : Kubernetes namespace to watch
- `--selector` : Label selectors for regular mode (format: `key1=value1 key2=value2` )
2025-04-29 10:21:19 -07:00
2025-07-06 22:54:11 -07:00
#### PD Mode
- `--pd-disaggregation` : Enable Prefill-Decode disaggregated mode
- `--prefill` : Initial prefill server (format: `URL BOOTSTRAP_PORT` )
- `--decode` : Initial decode server URL
- `--prefill-selector` : Label selector for prefill pods
- `--decode-selector` : Label selector for decode pods
- `--policy` : Routing policy (`cache_aware` , `random` , `power_of_two` )
2025-01-19 17:05:23 +08:00
2025-07-06 22:54:11 -07:00
## Development
2025-01-19 17:05:23 +08:00
2025-07-06 22:54:11 -07:00
### Build Process
2025-01-19 17:05:23 +08:00
2025-07-06 22:54:11 -07:00
```bash
# Build Rust project
cargo build
# Build Python binding (see Installation section above)
```
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
**Note**: When modifying Rust code, you must rebuild the wheel for changes to take effect.
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
### Troubleshooting
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
**VSCode Rust Analyzer Issues:**
Set `rust-analyzer.linkedProjects` to the absolute path of `Cargo.toml` :
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
```json
{
"rust-analyzer.linkedProjects": ["/workspaces/sglang/sgl-router/Cargo.toml"]
}
```
2024-11-11 12:19:32 -08:00
2025-07-06 22:54:11 -07:00
### CI/CD Pipeline
The continuous integration pipeline includes comprehensive testing, benchmarking, and publishing:
#### Build & Test
1. **Build Wheels** : Uses `cibuildwheel` for manylinux x86_64 packages
2. **Build Source Distribution** : Creates source distribution for pip fallback
3. **Rust HTTP Server Benchmarking** : Performance testing of router overhead
4. **Basic Inference Testing** : End-to-end validation through the router
5. **PD Disaggregation Testing** : Benchmark and sanity checks for prefill-decode load balancing
#### Publishing
- **PyPI Publishing**: Wheels and source distributions are published only when the version changes in `pyproject.toml`
- **Container Images**: Docker images published using `/docker/Dockerfile.router`
## Features
- **High Performance**: Rust-based routing with connection pooling and optimized request handling
- **Advanced Load Balancing**: Multiple algorithms including:
- **Cache-Aware**: Intelligent routing based on cache locality for optimal performance
- **Power of Two**: Chooses the less loaded of two randomly selected workers
- **Random**: Distributes requests randomly across available workers
- **Round Robin**: Sequential distribution across workers in rotation
- **Prefill-Decode Disaggregation**: Specialized load balancing for separated prefill and decode servers
- **Service Discovery**: Automatic Kubernetes worker discovery and health management
- **Monitoring**: Comprehensive Prometheus metrics and structured logging
- **Scalability**: Handles thousands of concurrent connections with efficient resource utilization