[router] add py binding and readme for openai router and history backend (#11453)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 09:42:34 -07:00
parent 5ea96ac7cc
commit eb8cac6fe2
8 changed files with 488 additions and 25 deletions
--- a/sgl-router/README.md
+++ b/sgl-router/README.md
@@ -79,6 +79,25 @@ python -m sglang_router.launch_router \
    --worker-urls http://worker1:8000 http://worker2:8000
 ```

+#### OpenAI Backend Mode
+Route requests to OpenAI or OpenAI-compatible endpoints:
+
+```bash
+# Route to OpenAI API
+python -m sglang_router.launch_router \
+    --backend openai \
+    --worker-urls https://api.openai.com
+
+# Route to custom OpenAI-compatible endpoint
+python -m sglang_router.launch_router \
+    --backend openai \
+    --worker-urls http://my-openai-compatible-service:8000
+```
+
+**Note**:
+ - OpenAI backend mode acts as a simple proxy. Load balancing is not applicable in this mode.
+ - This mode now requires exactly one `--worker-urls` entry.
+
 #### Launch Router with Worker URLs in prefill-decode mode
 ```bash
 # Note that the prefill and decode URLs must be provided in the following format:
@@ -194,6 +213,65 @@ python -m sglang_router.launch_router \

 Default headers: `x-request-id`, `x-correlation-id`, `x-trace-id`, `request-id`

+### History Backend (Conversation Storage)
+
+Store conversation and response data for tracking, debugging, or analytics.
+
+**NOTE: This feature is currently only supported in OpenAI router mode.**
+
+#### Available options for history storage
+
+- **Memory** (default): In-memory storage, fast but ephemeral
+- **None**: No storage, minimal overhead
+- **Oracle**: Persistent storage using Oracle ATP
+
+```bash
+# Memory backend (default)
+python -m sglang_router.launch_router --backend openai \
+    --worker-urls https://api.openai.com \
+    --history-backend memory
+
+# No storage for maximum performance
+python -m sglang_router.launch_router --backend openai \
+    --worker-urls https://api.openai.com \
+    --history-backend none
+
+# Oracle ATP backend
+
+# Install Oracle Instant Client
+# https://download.oracle.com/otn_software/linux/instantclient/2390000/instantclient-basic-linux.x64-23.9.0.25.07.zip
+export LD_LIBRARY_PATH=/home/ubuntu/instant-client/instantclient_23_9
+
+# choose ONE of the following connection methods:
+# Option 1: Using full connection descriptor
+export ATP_DSN="(description=(address=(protocol=tcps)(port=1522)(host=adb.region.oraclecloud.com))(connect_data=(service_name=service_name)))"
+
+# Option 2: Using TNS alias (requires wallet)
+export ATP_TNS_ALIAS="sglroutertestatp_high"
+export ATP_WALLET_PATH="/path/to/wallet"
+
+# service user config
+export ATP_USER="admin"
+export ATP_PASSWORD="YourPassword123"
+
+python -m sglang_router.launch_router \
+    --worker-urls https://api.openai.com \
+    --backend openai \
+    --history-backend oracle
+```
+
+**Oracle Configuration Parameters:**
+- `--oracle-tns-alias`: TNS alias from tnsnames.ora (env: `ATP_TNS_ALIAS`)
+  - Requires `--oracle-wallet-path` to locate tnsnames.ora
+- `--oracle-username`: Database username (env: `ATP_USER`)
+- `--oracle-password`: Database password (env: `ATP_PASSWORD`)
+- `--oracle-wallet-path`: Path to wallet directory (env: `ATP_WALLET_PATH`)
+  - Required when using TNS alias
+- `--oracle-pool-min`: Minimum connections (default: 1, env: `ATP_POOL_MIN`)
+- `--oracle-pool-max`: Maximum connections (default: 16, env: `ATP_POOL_MAX`)
+
+**Note**: You must provide **either** `--oracle-tns-alias` **or** `--oracle-connect-descriptor`, but not both.
+
 ## Advanced Features

 ### Kubernetes Service Discovery
@@ -407,6 +485,24 @@ curl -X POST http://localhost:8080/add_worker?url=http://worker3:8000&api_key=wo

 ### Command Line Arguments Reference

+#### Backend Selection
+- `--backend`: Backend runtime to use (default: `sglang`)
+  - `sglang`: SGLang workers (default)
+  - `openai`: OpenAI or OpenAI-compatible endpoints
+
+#### History Backend
+- `--history-backend`: Storage backend for conversations (default: `memory`)
+  - `memory`: In-memory storage (default)
+  - `none`: No storage
+  - `oracle`: Oracle ATP persistent storage
+- `--oracle-tns-alias`: Oracle TNS alias from tnsnames.ora (env: `ATP_TNS_ALIAS`, mutually exclusive with `--oracle-connect-descriptor`)
+- `--oracle-connect-descriptor`: Oracle full connection string (env: `ATP_DSN`, mutually exclusive with `--oracle-tns-alias`)
+- `--oracle-username`: Oracle username (env: `ATP_USER`)
+- `--oracle-password`: Oracle password (env: `ATP_PASSWORD`)
+- `--oracle-wallet-path`: Oracle wallet directory (env: `ATP_WALLET_PATH`, required for TNS alias)
+- `--oracle-pool-min`: Min pool connections (default: 1, env: `ATP_POOL_MIN`)
+- `--oracle-pool-max`: Max pool connections (default: 16, env: `ATP_POOL_MAX`)
+
 #### Service Discovery
 - `--service-discovery`: Enable Kubernetes service discovery
 - `--service-discovery-port`: Port for worker URLs (default: 8000)
@@ -479,6 +575,8 @@ The continuous integration pipeline includes comprehensive testing, benchmarking
  - **Random**: Distributes requests randomly across available workers
  - **Round Robin**: Sequential distribution across workers in rotation
 - **Prefill-Decode Disaggregation**: Specialized load balancing for separated prefill and decode servers
+- **Multiple Backend Support**: Route to SGLang, OpenAI, or other OpenAI-compatible services
+- **Conversation Storage**: Track conversations with memory, Oracle ATP, or disable for minimal overhead
 - **Service Discovery**: Automatic Kubernetes worker discovery and health management
 - **Monitoring**: Comprehensive Prometheus metrics and structured logging
 - **Scalability**: Handles thousands of concurrent connections with efficient resource utilization