xc-llm-ascend

Files

baxingpiaochong df88a2ecc8 [P/D]mooncake_connector adapted to 0.10.1 (#2664 )

### What this PR does / why we need it?
In vllm version 0.10.1, a new KVOutputAggregator was added to the
executor, moving aggregation to the
executor(https://github.com/vllm-project/vllm/pull/19555). This caused
mooncake_connector to break. This change aims to fix this bug and also
adds a policy to forcibly release the KV cache when the prefill node
times out.

This PR is currently linked to a PR in vllm
(https://github.com/vllm-project/vllm/pull/23917). The vllm PR aims to
modify the finish and send count confirmation in heterogeneous TP
situations.

The reason for deleting many UTs is that a lot of communication codes
have been deleted, so the UT as a whole will appear more concise.

- vLLM version: v0.10.1.1
- vLLM main:
fa4311d85f

---------

Signed-off-by: baxingpiaochong <771405853@qq.com>

2025-09-04 08:22:10 +08:00

e2e

[CI] Enable MTP torchair e2e test (#2705 )

2025-09-03 08:57:43 +08:00

[P/D]mooncake_connector adapted to 0.10.1 (#2664 )

2025-09-04 08:22:10 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00