Commit Graph

3 Commits

Author SHA1 Message Date
liziyu
aa3c4563ce fix all cards super_pod_id same on A3 & proxy support min_tokens (#2939)
### What this PR does / why we need it?
fix all cards super_pod_id same on A3 & proxy support min_tokens
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
2*A3 gen ranktable
before:
"prefill_device_list": [
        {
            "server_id": "xxx",
            "device_id": "0",
            "device_ip": "xxx",
            "super_pod_id": "0",
            "super_device_id": "106758159",
            "cluster_id": "1"
        },
        {
            "server_id": "xxx",
            "device_id": "1",
            "device_ip": "xxx",
            "super_pod_id": "0",
            "super_device_id": "106758159",
            "cluster_id": "2"
        }...
after:
"prefill_device_list": [
        {
            "server_id": "xxx",
            "device_id": "0",
            "device_ip": "xxx",
            "super_pod_id": "0",
            "super_device_id": "104857600",
            "cluster_id": "1"
        },
        {
            "server_id": "xxx",
            "device_id": "1",
            "device_ip": "xxx",
            "super_pod_id": "0",
            "super_device_id": "104923137",
            "cluster_id": "2"
        }...

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
2025-09-16 01:09:18 +08:00
wangxiaoteng666
ee6d141dd4 [MAIN][BUGFIX] BugFix: Resolve the issue of waiting queue accumulation when requests are canceled. (#2426)
### What this PR does / why we need it?
Resolve the issue of waiting queue accumulation when requests are
canceled.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By ci


- vLLM version: v0.10.1.1
- vLLM main:
006477e60b

---------

Signed-off-by: wangxiaoteng666 <wangxiaoteng@huawei.com>
2025-08-29 17:19:23 +08:00
Pleaplusone
4b3a210c33 Implementation of simple load balance routing proxy server (#1953) (#2124)
### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
https://github.com/vllm-project/vllm-ascend/pull/1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
ad57f23f6a

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-08-04 10:35:53 +08:00