[router] add different policies for p node and d node (#8395)

This commit is contained in:
Simo Lin
2025-07-27 00:39:20 -07:00
committed by GitHub
parent 0bcc195f4e
commit 2ab97023e3
10 changed files with 536 additions and 81 deletions

View File

@@ -254,7 +254,11 @@ impl LoadBalancingPolicy for CacheAwarePolicy {
decode_workers: &[Box<dyn Worker>],
request_text: Option<&str>,
) -> Option<(usize, usize)> {
// In PD mode:
// DEPRECATED: This method is no longer used when separate policies are configured.
// The PD router now uses separate policies for prefill and decode selection.
// This implementation remains for backward compatibility when a single policy is used.
// In PD mode with single policy:
// - Prefill: Use cache-aware routing for better cache utilization
// - Decode: Use least-load routing for better load distribution