[Docs] [Router] Update Observability and Common Issues Section (#11302)

2025-10-07 23:03:09 +08:00
parent 4f42c8cd3e
commit 0958a39704
1 changed files with 19 additions and 1 deletions
--- a/docs/advanced_features/router.md
+++ b/docs/advanced_features/router.md
@@ -421,13 +421,31 @@ python -m sglang_router.launch_router \
    --request-id-headers x-request-id x-trace-id
 ```
 ## Observability
 When Prometheus is enabled, the router provides several key metrics for observability.
 | Metric Name                            | Type      | Description                                                                                          |
 |:---------------------------------------|:----------|:-----------------------------------------------------------------------------------------------------|
 | `sgl_router_requests_total`            | Counter   | Total number of requests received by the router's API endpoint. Useful for tracking overall traffic. |
 | `sgl_router_processed_requests_total`  | Counter   | Total requests processed, labeled by `worker`. Critical for spotting load imbalances.                |
 | `sgl_router_active_workers`            | Gauge     | The current number of healthy workers in the routing pool. Essential for alerting.                   |
 | `sgl_router_running_requests`          | Gauge     | The number of currently in-flight requests, labeled by `worker`. For monitoring real-time load.      |
 | `sgl_router_cache_hits_total`          | Counter   | Total requests routed to a worker with a matching prefix cache.                                      |
 | `sgl_router_cache_misses_total`        | Counter   | Total requests that could not be routed based on cache locality.                                     |
 | `sgl_router_generate_duration_seconds` | Histogram | Tracks end-to-end request latency. Use this to monitor performance (e.g., p95/p99).                  |
 ## Troubleshooting
 ### Common Issues
 1. **Workers not connecting**: Ensure workers are fully initialized before starting the router. Use `--worker-startup-timeout-secs` to increase wait time.
-2. **High latency**: Check if cache-aware routing is causing imbalance. Try adjusting `--balance-abs-threshold` and `--balance-rel-threshold`.
+2. **High latency**:
   - **A common cause**: Load Imbalanced.
   - Check the `sgl_router_processed_requests_total` metric grouped by `worker`.
   - Cache-aware routing might be prioritizing cache hits too aggressively.
   - Try adjusting `--balance-abs-threshold` and `--balance-rel-threshold`.
 3. **Memory growth**: Reduce `--max-tree-size` or decrease `--eviction-interval-secs` for more aggressive cache cleanup.