xc-llm-ascend

Files

Pleaplusone 4b3a210c33 Implementation of simple load balance routing proxy server (#1953 ) (#2124 )

### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
https://github.com/vllm-project/vllm-ascend/pull/1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
ad57f23f6a

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

2025-08-04 10:35:53 +08:00

disaggregated_prefill_v1

Implementation of simple load balance routing proxy server (#1953 ) (#2124 )

2025-08-04 10:35:53 +08:00

eplb

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

offline_data_parallel.py

[Misc] Add extra checking to torchair_graph_config. (#1939 )

2025-08-01 09:24:11 +08:00

offline_disaggregated_prefill_npu.py

[BugFix] update the kv transfer config (#2121 )