### What this PR does / why we need it?
1.Fixed memory retention on certain GPUs caused by missing PUT
operations.
2.Fixed performance degradation resulting from architectural
incompatibilities in the underlying refactor.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
---------
Signed-off-by: fems14 <1804143737@qq.com>
### What this PR does / why we need it?
In mooncake kvpool, `local_hostname` is not used. Instead, the local IP
is obtained directly via `get_ip()`. Therefore, remove this parameter to
avoid confusion.
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
7157596103
Signed-off-by: LCAIZJ <leichao139636@163.com>
### What this PR does / why we need it?
In the current KV Pool scenario for models like MLA and GQA, where
different TP ranks generate identical KV caches, the system is designed
to store only a single copy. The previous approach allowed each card to
query storage requirements dynamically, but inconsistent query results
across cards led to incorrect storage. To fix this, the new solution
pre-allocates storage responsibilities; each card now simply stores its
pre-assigned blocks, bypassing the inconsistent query step and ensuring
data correctness.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: fems14 <1804143737@qq.com>
### What this PR does / why we need it?
The recommended configuration in the document kv_pool.md is ascend.
Modify the default value of the protocol to ascend,Improve usability
#### 1.Configure mooncake.json
The environment variable **MOONCAKE_CONFIG_PATH** is configured to the
full path where mooncake.json is located.
```
{
"local_hostname": "xx.xx.xx.xx",
"metadata_server": "P2PHANDSHAKE",
"protocol": "ascend",
"device_name": "",
"alloc_in_same_node": true,
"master_server_address": "xx.xx.xx.xx:50088",
"global_segment_size": "1GB" (1024MB/1048576KB/1073741824B/1073741824)
}
```
**local_hostname**: Configured as the IP address of the current master
node.
**metadata_server**: Configured as **P2PHANDSHAKE**.
**protocol:** Configured for Ascend to use Mooncake's HCCL
communication.
**device_name**: ""
**alloc_in_same_node**: Indicator for preferring local buffer allocation
strategy.
**master_server_address**: Configured with the IP and port of the master
service.
**global_segment_size**: Expands the kvcache size registered by the PD
node to the master.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Mooncake does not set up a protocol to launch the pooled VLLM service;
test whether the pooling function is working.
Signed-off-by: lty <linhebiwen@gmail.com>
What this PR does / why we need it?
Improve usability,local_buffer_size support for units: GB, MB, KB, B,
For example, "2GB"
{
"local_hostname": "XXX.XXX.XXX.XXX",
"metadata_server": "P2PHANDSHAKE",
"protocol": "ascend",
"device_name": "",
"use_ascend_direct": true,
"master_server_address": "XXX.XXX.XXX.XXX:50088",
"global_segment_size": 60000000000,
"local_buffer_size": "2GB"
}
Does this PR introduce any user-facing change?
local_buffer_size support for units: GB, MB, KB, B
How was this patch tested?
Mooncake configures local_buffer_size as GB, MB, KB, B
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: lty <linxianchong1@huawei.com>
### What this PR does / why we need it?
check kv extra config & del hccl backend
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
1.In short, we renamed the existing MooncakeStoreConnector to
AscendStoreConnector and extracted the storage engine interaction logic
into a new Backend class.
Associated RFC:https://github.com/vllm-project/vllm-ascend/issues/4329
2.Fixed the issue where the number of input parameters for the connector
was incorrect, introduced in vllm 0.11.2
### Does this PR introduce _any_ user-facing change?
change MooncakeStoreConnector to AscendStoreConnector
### How was this patch tested?
- vLLM version: v0.11.2
---------
Signed-off-by: fems14 <1804143737@qq.com>