Commit Graph

10 Commits

Author SHA1 Message Date
Slightwind
9c0ad46c1a [0.11.0][Bugfix] Remove the ZMQ communication setup on the D node (#4916)
In the PD separation scenario, the D node does not need to perform get
operations, and therefore does not need to create ZeroMQ (ZMQ)
communication.
---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
2025-12-12 14:37:49 +08:00
liziyu
ddf3e75800 [Cherry-pick] [0.11.0] pd proxy support ipv6 and fix proxy (#4242)
### What this PR does / why we need it?
pd proxy support ipv6, mooncake connector check whether the IPv6 address
is used and notify the user.

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
2025-11-18 16:33:00 +08:00
fems14
99e154dc84 [0.11.0] cherry-pick from #3747 (#3746)
cherry-pick from #3747

correct _register function place for mooncacke

Signed-off-by: fems14 <1804143737@qq.com>
2025-10-25 14:21:30 +08:00
fems14
17dd9ae42c [0.11.0][bugfix]look up multi_tp key (#3699) (#3723)
### What this PR does / why we need it?
In multi-Tensor Parallel (TP) scenarios, the KV pool only queries the
first GPU card. When keys on other cards are released, the query result
still returns as successful, introducing accuracy issues. This PR
modifies the KV pool's query logic to check all cards, resolving this
problem.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: fems14 <1804143737@qq.com>
2025-10-24 18:22:45 +08:00
fems14
f0eb3e1d97 [v0.11.0][bugfix]kvpool sync load (#3698) (#3722)
### What this PR does / why we need it?
In certain scenarios, the performance of synchronously loading data from
the pool is better than that of asynchronously loading data. Therefore,
a control logic (or switch) for asynchronous loading from the pool has
been added.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

Signed-off-by: fems14 <1804143737@qq.com>
2025-10-24 18:21:46 +08:00
何必问
33514a4cc2 [Bugfix] The server fails to locate the request, leading to the server hanging. (#3721)
### What this PR does / why we need it?
fix bug: In the mooncake pooling scenario, when the client closes the
request, the server fails to locate the request, leading to the server
hanging.oling scenario, when the client closes the request, the server
fails to locate the request, leading to the server hanging.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
Pull up the PD separated pooling service, send requests using aisbench,
press CTRL+C twice, and check if the vllm_ascend service exit.

---------

Signed-off-by: linhebiwen <linhebiwen@gmail.com>
2025-10-24 17:41:29 +08:00
Chao Lei
11f9bccf6b Mooncake store use adxl inferface (#3350)
Use adxl inferface in mooncake store, mooncake PR
https://github.com/kvcache-ai/Mooncake/pull/929

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: LCAIZJ <leichao139636@163.com>
2025-10-21 20:18:17 +08:00
DreamerLeader
aa6154703a [BugFix]GPQA Accuracy Issue Bugfix (#3476)
### What this PR does / why we need it?
The GPQA dataset accuracy in the PD separation scenario of testing is
33.2, which does not meet the paper's requirement of 70. Resolve this
accuracy issue.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
qpqa has accuracy issues, but modifying the code can ensure the accuracy
meets the standard

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: fjw <2270923832@qq.com>
2025-10-15 23:28:17 +08:00
fems14
1c9f0fe26f Fix of DeepSeek Error in KV Pool Mixed Deployment Scenario (#3087)
### What this PR does / why we need it?
A new kv_role "kv_both" is added to run mixed deployment scenarios. The
mixed deployment will involve a decode phase, where with_prefill should
be false.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
c60e6137f0

Signed-off-by: fems14 <1804143737@qq.com>
2025-09-22 20:36:41 +08:00
Chao Lei
cef43b524e [Feat] A Connector that supports Mooncake store (#2913)
### What this PR does / why we need it?
Added a new connector for Mooncake store integration to enable kvcache
reuse in scenarios with system prompts or multi-turn dialogues.

### How was this patch tested?


- vLLM version: v0.10.2
- vLLM main:
5963b98b46

---------

Signed-off-by: LCAIZJ <leichao139636@163.com>
Signed-off-by: fems14 <1804143737@qq.com>
Co-authored-by: fems14 <1804143737@qq.com>
Co-authored-by: Dreamerleader <2270923832@qq.com>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
Co-authored-by: lizy124 <1950471827@qq.com>
Co-authored-by: zouyida2052 <zouyida2002@gmail.com>
2025-09-18 14:04:45 +08:00