[Docs] [Router] Update Observability and Common Issues Section (#11302)
This commit is contained in:
@@ -421,13 +421,31 @@ python -m sglang_router.launch_router \
|
|||||||
--request-id-headers x-request-id x-trace-id
|
--request-id-headers x-request-id x-trace-id
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Observability
|
||||||
|
|
||||||
|
When Prometheus is enabled, the router provides several key metrics for observability.
|
||||||
|
|
||||||
|
| Metric Name | Type | Description |
|
||||||
|
|:---------------------------------------|:----------|:-----------------------------------------------------------------------------------------------------|
|
||||||
|
| `sgl_router_requests_total` | Counter | Total number of requests received by the router's API endpoint. Useful for tracking overall traffic. |
|
||||||
|
| `sgl_router_processed_requests_total` | Counter | Total requests processed, labeled by `worker`. Critical for spotting load imbalances. |
|
||||||
|
| `sgl_router_active_workers` | Gauge | The current number of healthy workers in the routing pool. Essential for alerting. |
|
||||||
|
| `sgl_router_running_requests` | Gauge | The number of currently in-flight requests, labeled by `worker`. For monitoring real-time load. |
|
||||||
|
| `sgl_router_cache_hits_total` | Counter | Total requests routed to a worker with a matching prefix cache. |
|
||||||
|
| `sgl_router_cache_misses_total` | Counter | Total requests that could not be routed based on cache locality. |
|
||||||
|
| `sgl_router_generate_duration_seconds` | Histogram | Tracks end-to-end request latency. Use this to monitor performance (e.g., p95/p99). |
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
### Common Issues
|
### Common Issues
|
||||||
|
|
||||||
1. **Workers not connecting**: Ensure workers are fully initialized before starting the router. Use `--worker-startup-timeout-secs` to increase wait time.
|
1. **Workers not connecting**: Ensure workers are fully initialized before starting the router. Use `--worker-startup-timeout-secs` to increase wait time.
|
||||||
|
|
||||||
2. **High latency**: Check if cache-aware routing is causing imbalance. Try adjusting `--balance-abs-threshold` and `--balance-rel-threshold`.
|
2. **High latency**:
|
||||||
|
- **A common cause**: Load Imbalanced.
|
||||||
|
- Check the `sgl_router_processed_requests_total` metric grouped by `worker`.
|
||||||
|
- Cache-aware routing might be prioritizing cache hits too aggressively.
|
||||||
|
- Try adjusting `--balance-abs-threshold` and `--balance-rel-threshold`.
|
||||||
|
|
||||||
3. **Memory growth**: Reduce `--max-tree-size` or decrease `--eviction-interval-secs` for more aggressive cache cleanup.
|
3. **Memory growth**: Reduce `--max-tree-size` or decrease `--eviction-interval-secs` for more aggressive cache cleanup.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user