[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
bde38c11df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
This commit is contained in:
SILONG ZENG
2026-01-15 09:06:01 +08:00
committed by GitHub
parent 96edd4673f
commit 4811ba62e0
75 changed files with 711 additions and 308 deletions

View File

@@ -15,6 +15,7 @@ This feature addresses the need to optimize the **Time Per Output Token (TPOT)**
## Usage
vLLM Ascend currently supports two types of connectors for handling KV cache management:
- **MooncakeConnector**: D nodes pull KV cache from P nodes.
- **MooncakeLayerwiseConnector**: P nodes push KV cache to D nodes in a layered manner.
@@ -35,7 +36,7 @@ Our design diagram is shown below, illustrating the pull and push schemes respec
![alt text](../../assets/disaggregated_prefill_pull.png)
![alt text](../../assets/disaggregated_prefill_push.png)
#### Mooncake Connector:
#### Mooncake Connector
1. The request is sent to the Proxys `_handle_completions` endpoint.
2. The Proxy calls `select_prefiller` to choose a P node and forwards the request, configuring `kv_transfer_params` with `do_remote_decode=True`, `max_tokens=1`, and `min_tokens=1`.
@@ -43,7 +44,7 @@ Our design diagram is shown below, illustrating the pull and push schemes respec
4. The Proxy calls `select_decoder` to choose a D node and forwards the request.
5. On the D node, the scheduler marks the request as `RequestStatus.WAITING_FOR_REMOTE_KVS`, pre-allocates KV cache, calls `kv_connector_no_forward` to pull the remote KV cache, then notifies the P node to release KV cache and proceeds with decoding to return the result.
#### Mooncake Layerwise Connector:
#### Mooncake Layerwise Connector
1. The request is sent to the Proxys `_handle_completions` endpoint.
2. The Proxy calls `select_decoder` to choose a D node and forwards the request, configuring `kv_transfer_params` with `do_remote_prefill=True` and setting the `metaserver` endpoint.
@@ -55,6 +56,7 @@ Our design diagram is shown below, illustrating the pull and push schemes respec
### 3. Interface Design
Taking MooncakeConnector as an example, the system is organized into three primary classes:
- **MooncakeConnector**: Base class that provides core interfaces.
- **MooncakeConnectorScheduler**: Interface for scheduling the connectors within the engine core, responsible for managing KV cache transfer requirements and completion.
- **MooncakeConnectorWorker**: Interface for managing KV cache registration and transfer in worker processes.