Commit Graph

19 Commits

Author SHA1 Message Date
fems14
ff4c1a47b3 [bugfix] Fixing KV Pool Memory Retention and Performance Degradation Issues (#5751)
### What this PR does / why we need it?
1.Fixed memory retention on certain GPUs caused by missing PUT
operations.

2.Fixed performance degradation resulting from architectural
incompatibilities in the underlying refactor.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: fems14 <1804143737@qq.com>
2026-01-09 17:46:23 +08:00
Shanshan Shen
b94d589769 [MM][Bugfix] Update hf_config to hf_text_config (#5319)
### What this PR does / why we need it?

Following https://github.com/vllm-project/vllm-ascend/pull/5205, update
`hf_config` to `hf_text_config`.

Find more details at
https://github.com/vllm-project/vllm-ascend/pull/5205#issuecomment-3675417534
and
https://github.com/vllm-project/vllm-ascend/pull/5205#issuecomment-3677920872.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
5fbfa8d9ef

Signed-off-by: shen-shanshan <467638484@qq.com>
2026-01-06 16:41:39 +08:00
Chao Lei
473431e7e2 [P/D]Remove mooncake kvpool unused parameter local_hostname (#5574)
### What this PR does / why we need it?
In mooncake kvpool, `local_hostname` is not used. Instead, the local IP
is obtained directly via `get_ip()`. Therefore, remove this parameter to
avoid confusion.

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
7157596103

Signed-off-by: LCAIZJ <leichao139636@163.com>
2026-01-05 20:18:59 +08:00
baxingpiaochong
46c2fc6a3c [KVPOOL]decode save kvcache (#5168)
### What this PR does / why we need it?

kvpool decode save kvcache
now only support mla

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: baxingpiaochong <771405853@qq.com>
Co-authored-by: Chao Lei <leichao139636@163.com>
2026-01-04 22:22:01 +08:00
fems14
2ef4d1979e [bugfix][main]KV Pool for KV Transfer in PD Disaggregation Scenarios (#5398)
### What this PR does / why we need it?
1.KV Pool for KV Transfer in PD Disaggregation Scenarios Error
Resolution
2.Update KV Pool Documentation

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
254f6b9867

---------

Signed-off-by: fems14 <1804143737@qq.com>
2025-12-27 09:53:57 +08:00
Chao Lei
9c02fa9867 [bugfix] Fix mooncake kvpool accuracy issue (#4976)
### What this PR does / why we need it?

The current KVPool has a accuracy issue
https://github.com/vllm-project/vllm-ascend/issues/4412. This PR aims to
fix the precision problem without impacting prefill performance.

Note:Due to a bug in ADXL, calling `current_event.synchronize()` may
occasionally hang. This issue will be fixed in Cann version 8.5.rc1. You
can manually build the master branch of the project at
https://gitcode.com/cann/hixl to resolve this issue before the 8.5.RC1
release.


- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: LCAIZJ <leichao139636@163.com>
2025-12-16 11:33:16 +08:00
fems14
b662d914a4 [bugfix] [main] Fix KV cache query inconsistency across different TP ranks in the KV Pool (#5030)
### What this PR does / why we need it?
In the current KV Pool scenario for models like MLA and GQA, where
different TP ranks generate identical KV caches, the system is designed
to store only a single copy. The previous approach allowed each card to
query storage requirements dynamically, but inconsistent query results
across cards led to incorrect storage. To fix this, the new solution
pre-allocates storage responsibilities; each card now simply stores its
pre-assigned blocks, bypassing the inconsistent query step and ensuring
data correctness.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: fems14 <1804143737@qq.com>
2025-12-15 21:56:05 +08:00
baxingpiaochong
95e6400128 [KVPool]Fix PP get bug (#5007)
### What this PR does / why we need it?

When kv caches are evicted from the key-value pool, it's possible that
the kv cache for pp0 is still active, but the kv cache for pp1 has
already been evicted. Therefore, a unified check is needed during the
get operation.


- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: baxingpiaochong <771405853@qq.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
2025-12-15 20:27:57 +08:00
lty
0cdf98ac48 [usability]Modify the default value of the protocol to ascend (#4959)
### What this PR does / why we need it?
The recommended configuration in the document kv_pool.md is ascend.
Modify the default value of the protocol to ascend,Improve usability

#### 1.Configure mooncake.json

The environment variable **MOONCAKE_CONFIG_PATH** is configured to the
full path where mooncake.json is located.

```
{
    "local_hostname": "xx.xx.xx.xx",
    "metadata_server": "P2PHANDSHAKE",
    "protocol": "ascend",
    "device_name": "",
    "alloc_in_same_node": true,
    "master_server_address": "xx.xx.xx.xx:50088",
    "global_segment_size": "1GB" (1024MB/1048576KB/1073741824B/1073741824)
}
```

**local_hostname**: Configured as the IP address of the current master
node.
**metadata_server**: Configured as **P2PHANDSHAKE**.  
**protocol:** Configured for Ascend to use Mooncake's HCCL
communication.
**device_name**: ""  
**alloc_in_same_node**: Indicator for preferring local buffer allocation
strategy.
**master_server_address**: Configured with the IP and port of the master
service.
**global_segment_size**: Expands the kvcache size registered by the PD
node to the master.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Mooncake does not set up a protocol to launch the pooled VLLM service;
test whether the pooling function is working.

Signed-off-by: lty <linhebiwen@gmail.com>
2025-12-12 16:56:18 +08:00
Slightwind
b8a317caac [main][Bugfix] Remove the ZMQ communication setup on the D node (#4926)
In the PD separation scenario, the D node does not need to perform get
operations, and therefore does not need to create ZeroMQ (ZMQ)
communication.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
2025-12-12 14:37:26 +08:00
lty
dee00d0de3 [Usability]local_buffer_size support for units: GB, MB, KB, B (#4829)
What this PR does / why we need it?
Improve usability,local_buffer_size support for units: GB, MB, KB, B,
For example, "2GB"
{
    "local_hostname": "XXX.XXX.XXX.XXX",
    "metadata_server": "P2PHANDSHAKE",
    "protocol": "ascend",
    "device_name": "",
    "use_ascend_direct": true,
    "master_server_address": "XXX.XXX.XXX.XXX:50088",
    "global_segment_size": 60000000000,
    "local_buffer_size": "2GB"
}

Does this PR introduce any user-facing change?
local_buffer_size support for units: GB, MB, KB, B

How was this patch tested?
Mooncake configures local_buffer_size as GB, MB, KB, B
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: lty <linxianchong1@huawei.com>
2025-12-09 17:52:24 +08:00
baxingpiaochong
dda027e680 [KVPOOl]Support pp (#4761)
### What this PR does / why we need it?
Support pp for kv pool

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: baxingpiaochong <771405853@qq.com>
2025-12-09 16:15:26 +08:00
liziyu
688b1332da [P/D] check kv extra config and del hccl backend (#4547)
### What this PR does / why we need it?
check kv extra config & del hccl backend


- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-07 15:19:42 +08:00
LookAround0301
b32ef53b3b [long_seq] remove long_seq env (#4660)
### What this PR does / why we need it?
remove env VLLM_ASCEND_ENABLE_CONTEXT_PARALLEL 

- vLLM version: v0.12.0

---------

Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: ZhangMingWei716 <2894054457@qq.com>
Co-authored-by: ZhangMingWei716 <2894054457@qq.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-05 10:31:49 +08:00
wangxiyuan
7f2673ea2d upgrade vLLM to main (#4608)
1. fix https://github.com/vllm-project/vllm/pull/28542
The model structure modifications we involved in are:
     - Qwen2.5-VL(still exist some patch)
     - Qwen2-VL
     - Qwen2
     - DeepSeek series
     - Qwen-moe series
2. fix https://github.com/vllm-project/vllm/pull/29121
   the output token now  type changed from np to `list[list[int]]`

3. fix https://github.com/vllm-project/vllm/pull/29262
    `xformers` backend for multimodal now has been deprecated
4. fix https://github.com/vllm-project/vllm/pull/29342

5. fix https://github.com/vllm-project/vllm/pull/28579
6. fix https://github.com/vllm-project/vllm/pull/28718
7. fix https://github.com/vllm-project/vllm/issues/28665
8. fix https://github.com/vllm-project/vllm/pull/26847
vllm introduced the `optimization-level`, some default config has been
changed, and the param `--enforce-eager` has been deprecated
9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple
for sampler.
10. fix https://github.com/vllm-project/vllm/pull/29471 we'll remove the
related patch to avoid this kind of error.

Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangli <wangli858794774@gmail.com>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2025-12-02 22:10:52 +08:00
Slightwind
aa56a0f4b7 [Bugfix] PCP adaptation for VLLM v0.11.2 modifications (#4604)
To adapt to the vLLM v0.11.2 image, the method for obtaining PCP size
and DCP size has been modified.
___
- vLLM version: v0.11.2

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
2025-12-01 19:20:32 +08:00
Chao Lei
ff7061317f [Bugfix] Fix kvpool precision synchronization (#4574)
### What this PR does / why we need it?
Fix kvpool precision synchronization
Issue https://github.com/vllm-project/vllm-ascend/issues/4412


- vLLM version: v0.11.2

---------

Signed-off-by: LCAIZJ <leichao139636@163.com>
2025-11-30 09:39:07 +08:00
DreamerLeader
4dbe4fd123 [feature]Pooling Features and PCP Adaptation (#4143)
This PR let pooling kv connector support pcp feature

- vLLM version: v0.11.2

---------

Signed-off-by: fjw <2270923832@qq.com>
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
Co-authored-by: SlightwindSec <slightwindsec@gmail.com>
2025-11-29 22:07:45 +08:00
fems14
5447a039b9 [Feature][main]reconstruction kvpool connector to ascend connector (#4438)
### What this PR does / why we need it?
1.In short, we renamed the existing MooncakeStoreConnector to
AscendStoreConnector and extracted the storage engine interaction logic
into a new Backend class.
Associated RFC:https://github.com/vllm-project/vllm-ascend/issues/4329
2.Fixed the issue where the number of input parameters for the connector
was incorrect, introduced in vllm 0.11.2
### Does this PR introduce _any_ user-facing change?
change MooncakeStoreConnector to AscendStoreConnector
### How was this patch tested?

- vLLM version: v0.11.2

---------

Signed-off-by: fems14 <1804143737@qq.com>
2025-11-28 18:08:37 +08:00