[router] add py binding and readme for openai router and history backend (#11453)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
Keyang Ru
2025-10-14 09:42:34 -07:00
committed by GitHub
parent 5ea96ac7cc
commit eb8cac6fe2
8 changed files with 488 additions and 25 deletions

View File

@@ -79,6 +79,25 @@ python -m sglang_router.launch_router \
--worker-urls http://worker1:8000 http://worker2:8000
```
#### OpenAI Backend Mode
Route requests to OpenAI or OpenAI-compatible endpoints:
```bash
# Route to OpenAI API
python -m sglang_router.launch_router \
--backend openai \
--worker-urls https://api.openai.com
# Route to custom OpenAI-compatible endpoint
python -m sglang_router.launch_router \
--backend openai \
--worker-urls http://my-openai-compatible-service:8000
```
**Note**:
- OpenAI backend mode acts as a simple proxy. Load balancing is not applicable in this mode.
- This mode now requires exactly one `--worker-urls` entry.
#### Launch Router with Worker URLs in prefill-decode mode
```bash
# Note that the prefill and decode URLs must be provided in the following format:
@@ -194,6 +213,65 @@ python -m sglang_router.launch_router \
Default headers: `x-request-id`, `x-correlation-id`, `x-trace-id`, `request-id`
### History Backend (Conversation Storage)
Store conversation and response data for tracking, debugging, or analytics.
**NOTE: This feature is currently only supported in OpenAI router mode.**
#### Available options for history storage
- **Memory** (default): In-memory storage, fast but ephemeral
- **None**: No storage, minimal overhead
- **Oracle**: Persistent storage using Oracle ATP
```bash
# Memory backend (default)
python -m sglang_router.launch_router --backend openai \
--worker-urls https://api.openai.com \
--history-backend memory
# No storage for maximum performance
python -m sglang_router.launch_router --backend openai \
--worker-urls https://api.openai.com \
--history-backend none
# Oracle ATP backend
# Install Oracle Instant Client
# https://download.oracle.com/otn_software/linux/instantclient/2390000/instantclient-basic-linux.x64-23.9.0.25.07.zip
export LD_LIBRARY_PATH=/home/ubuntu/instant-client/instantclient_23_9
# choose ONE of the following connection methods:
# Option 1: Using full connection descriptor
export ATP_DSN="(description=(address=(protocol=tcps)(port=1522)(host=adb.region.oraclecloud.com))(connect_data=(service_name=service_name)))"
# Option 2: Using TNS alias (requires wallet)
export ATP_TNS_ALIAS="sglroutertestatp_high"
export ATP_WALLET_PATH="/path/to/wallet"
# service user config
export ATP_USER="admin"
export ATP_PASSWORD="YourPassword123"
python -m sglang_router.launch_router \
--worker-urls https://api.openai.com \
--backend openai \
--history-backend oracle
```
**Oracle Configuration Parameters:**
- `--oracle-tns-alias`: TNS alias from tnsnames.ora (env: `ATP_TNS_ALIAS`)
- Requires `--oracle-wallet-path` to locate tnsnames.ora
- `--oracle-username`: Database username (env: `ATP_USER`)
- `--oracle-password`: Database password (env: `ATP_PASSWORD`)
- `--oracle-wallet-path`: Path to wallet directory (env: `ATP_WALLET_PATH`)
- Required when using TNS alias
- `--oracle-pool-min`: Minimum connections (default: 1, env: `ATP_POOL_MIN`)
- `--oracle-pool-max`: Maximum connections (default: 16, env: `ATP_POOL_MAX`)
**Note**: You must provide **either** `--oracle-tns-alias` **or** `--oracle-connect-descriptor`, but not both.
## Advanced Features
### Kubernetes Service Discovery
@@ -407,6 +485,24 @@ curl -X POST http://localhost:8080/add_worker?url=http://worker3:8000&api_key=wo
### Command Line Arguments Reference
#### Backend Selection
- `--backend`: Backend runtime to use (default: `sglang`)
- `sglang`: SGLang workers (default)
- `openai`: OpenAI or OpenAI-compatible endpoints
#### History Backend
- `--history-backend`: Storage backend for conversations (default: `memory`)
- `memory`: In-memory storage (default)
- `none`: No storage
- `oracle`: Oracle ATP persistent storage
- `--oracle-tns-alias`: Oracle TNS alias from tnsnames.ora (env: `ATP_TNS_ALIAS`, mutually exclusive with `--oracle-connect-descriptor`)
- `--oracle-connect-descriptor`: Oracle full connection string (env: `ATP_DSN`, mutually exclusive with `--oracle-tns-alias`)
- `--oracle-username`: Oracle username (env: `ATP_USER`)
- `--oracle-password`: Oracle password (env: `ATP_PASSWORD`)
- `--oracle-wallet-path`: Oracle wallet directory (env: `ATP_WALLET_PATH`, required for TNS alias)
- `--oracle-pool-min`: Min pool connections (default: 1, env: `ATP_POOL_MIN`)
- `--oracle-pool-max`: Max pool connections (default: 16, env: `ATP_POOL_MAX`)
#### Service Discovery
- `--service-discovery`: Enable Kubernetes service discovery
- `--service-discovery-port`: Port for worker URLs (default: 8000)
@@ -479,6 +575,8 @@ The continuous integration pipeline includes comprehensive testing, benchmarking
- **Random**: Distributes requests randomly across available workers
- **Round Robin**: Sequential distribution across workers in rotation
- **Prefill-Decode Disaggregation**: Specialized load balancing for separated prefill and decode servers
- **Multiple Backend Support**: Route to SGLang, OpenAI, or other OpenAI-compatible services
- **Conversation Storage**: Track conversations with memory, Oracle ATP, or disable for minimal overhead
- **Service Discovery**: Automatic Kubernetes worker discovery and health management
- **Monitoring**: Comprehensive Prometheus metrics and structured logging
- **Scalability**: Handles thousands of concurrent connections with efficient resource utilization