Commit Graph

9 Commits

Author SHA1 Message Date
wangxiaoteng888
c2d58c0655 [P/D][BugFix][v0.11.0-dev]Fix proxy format processing errors & Layerwise connector performance optimization (#4069)
### What this PR does / why we need it?
1.Fix proxy format processing errors.
2.Layer-wise connector performance optimization

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
2025-11-09 09:55:10 +08:00
zxr2333
d4e2a44307 [Cherry Pick from pr#3981][0.11.0][P/D]Make kv-transfer env variable take effect & Fix load-balance proxy (#3983)
### What this PR does / why we need it?
Make kv-transfer env variable take effect & Fix load-balance proxy.
Cherry Pick from #3981

---------
Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
2025-11-08 13:52:33 +08:00
zxr2333
954dab64fb [v0.11.0][P/D]Set adxl as default backend and update readme (#3771)
### What this PR does / why we need it?
Set adxl engine as the default Mooncake backend, because Ascend
Transport is no longer maintained.
Update README to include instructions for installing the adxl backend
Mooncake.

### Does this PR introduce _any_ user-facing change?
Users need to compile and install the mooncake backend for adxl
according to the revised README instructions.

### How was this patch tested?
By CI.

---------

Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
2025-11-04 16:06:58 +08:00
wangxiaoteng888
38afd2c9cb [bugfix_v0.11.0]cancel tokenize for layerwise_proxy (#3913)
### What this PR does / why we need it?
cancel tokenize for layerwise_proxy
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
by ci

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
2025-10-30 23:55:04 +08:00
wangxiaoteng888
af7a56550b [bugfix_v0.11.0-dev] layerwise D first plan (#3907)
### What this PR does / why we need it?
Refactored the layerwise code to send to the D node first, preventing
P-node hangs due to communication timeouts when DP > 1.
---------

Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
2025-10-30 22:21:11 +08:00
liziyu
e5b938c5fe [v0.11.0] [P/D] force with_prefill true after allreduce in kv producer (#3835)
### What this PR does / why we need it?
force with_prefill true after allreduce in kv producer. This is a backport of #3768 and #3849

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
2025-10-29 23:14:00 +08:00
zxr2333
c2c1db78a7 [Bugfix] fix ZeroDivisionError when prefill_tp_size > num_kv_head and fix tp_resharding README (#3437)
### What this PR does / why we need it?
Fix ZeroDivisionError when prefill_tp_size > num_kv_head, in this
situation, num_head_replica can be 0 and used to divide another value,
this PR restricts the minimum value of a to be 1. And this PR fix
tp_resharding README.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By CI.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
2025-10-15 08:45:44 +08:00
wangxiaoteng888
ca05f7d632 [Bugfix] TP size larger than KV cache head causes accuracy issues (#3366)
### What this PR does / why we need it?
Resolve the issue where, in the case of unequal TP (Tensor Parallelism),
the TP size is larger than the number of model attention kvcache heads,
causing the KV cache to generate duplicates, which leads to transmission
errors in the original code.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com>
2025-10-11 11:22:23 +08:00
Chao Lei
a486ff8c11 KVCache Transfer via Layer-wise Strategy in Disaggregation (#2602)
### What this PR does / why we need it?
See RFC: https://github.com/vllm-project/vllm-ascend/issues/2470 This PR
add a new kv connector for layer-wised kv transfer

### Does this PR introduce _any_ user-facing change?
yes, a new kv connector is added. User can use layer wised feature now.
### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

---------

Signed-off-by: leichao.lc <leichao139636@163.com>
Signed-off-by: CaveNightingale <2859066733@qq.com>
Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: hanxinlong <50882499@qq.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: CaveNightingale <2859066733@qq.com>
Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: hanxinlong <50882499@qq.com>
2025-09-30 15:10:29 +08:00