xc-llm-ascend

Files

hucong 8dd53c8860 [Bugfix][PD] Auto-clear producer KV cache if no pull notification (#2174 )

### What this PR does / why we need it?

This PR addresses a critical issue where Node D (Device) failures cause
Node P (Processor) to hang due to inability to release KV cache.

**Trigger Scenarios:**  
1. Node D fails mid-inference (e.g., network disconnection)  
2. Node D rejects requests at a certain stage (e.g., via API server)  
3. Load-test script termination causes Node P or D to abort queued
requests

**Root Cause Analysis:**  
1. Currently, Node D sends a "KV cache pull complete, release approved"
message to Node P
2. This message is transmitted via the worker connector. If PD
connection breaks or requests are rejected upstream, Node D cannot send
the message
3. Node P will never release KV cache without receiving this message  

**Solution:**  
Following VLLM community's approach (NIXL connector timeout mechanism),
we're implementing:
- A timeout mechanism with comprehensive warnings  
- Updated README documentation  
- Reference: VLLM's optimization PR
[#20139](https://github.com/vllm-project/vllm/pull/20139)
### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?
None


- vLLM version: v0.10.2
- vLLM main:
9607d5eb44

---------

Signed-off-by: underfituu <hzhucong@163.com>

2025-09-23 09:53:34 +08:00

device_communicators

[MISC] Clean up torch_npu (#688 )

2025-04-29 18:03:38 +08:00

mooncake

Fix of DeepSeek Error in KV Pool Mixed Deployment Scenario (#3087 )

2025-09-22 20:36:41 +08:00

__init__.py

[Feat] A Connector that supports Mooncake store (#2913 )

2025-09-18 14:04:45 +08:00

communicator.py

[2/N][Feat] Add MC2 communication method for MoE layers (#2469 )