xc-llm-ascend

Files

JiangWeixiang 41a52beb26 [bugfix] resolve kv cache leak on P-side due to incorrect req_id (#6325 )

### What this PR does / why we need it?
This PR fixes a critical bug in the PD-separated inference pipeline
where KV cache on the Prefill (P) side was not being properly released.
The issue arises when multiple clients use the same x-request-id: to
avoid request ID collisions, both Prefill and Decode nodes append a
random suffix to the incoming x-request-id. A previous PR ensured
consistency by having the P-side pass its final request_id as
remote_request_id to the D-side via kv_transfer_param. However, during
KV cache cleanup, the D-side incorrectly used the local req_id (instead
of remote_request_id) to select the target P-side rank. This mismatch
caused the P-side KV cache to remain unreleased on certain ranks,
leading to memory leaks. This PR corrects the logic to use
remote_request_id consistently when determining the P-side rank.
### Does this PR introduce _any_ user-facing change?
No. 
### How was this patch tested?
The fix was validated by running multiple concurrent benchmark instances

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

Signed-off-by: ghphotoframe <854746559@qq.com>

2026-01-29 16:05:56 +08:00

device_communicators

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #3 ) (#5978 )

2026-01-24 22:10:18 +08:00

kv_transfer

[bugfix] resolve kv cache leak on P-side due to incorrect req_id (#6325 )

2026-01-29 16:05:56 +08:00

__init__.py

[Refactor]Refactor of vllm_ascend/distributed module (#5719 )

2026-01-15 08:57:40 +08:00

parallel_state.py

[Feature] Support DSA-CP for Hybrid scenario (#5702 )

2026-01-22 10:12:09 +08:00

utils.py

[Feature] Support DSA-CP for Hybrid scenario (#5702 )

2026-01-22 10:12:09 +08:00