[Lint]Style: reformat markdown files via markdownlint (#5884)
### What this PR does / why we need it?
reformat markdown files via markdownlint
- vLLM version: v0.13.0
- vLLM main:
bde38c11df
---------
Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
This commit is contained in:
@@ -15,6 +15,7 @@ This feature addresses the need to optimize the **Time Per Output Token (TPOT)**
|
||||
## Usage
|
||||
|
||||
vLLM Ascend currently supports two types of connectors for handling KV cache management:
|
||||
|
||||
- **MooncakeConnector**: D nodes pull KV cache from P nodes.
|
||||
- **MooncakeLayerwiseConnector**: P nodes push KV cache to D nodes in a layered manner.
|
||||
|
||||
@@ -35,7 +36,7 @@ Our design diagram is shown below, illustrating the pull and push schemes respec
|
||||

|
||||

|
||||
|
||||
#### Mooncake Connector:
|
||||
#### Mooncake Connector
|
||||
|
||||
1. The request is sent to the Proxy’s `_handle_completions` endpoint.
|
||||
2. The Proxy calls `select_prefiller` to choose a P node and forwards the request, configuring `kv_transfer_params` with `do_remote_decode=True`, `max_tokens=1`, and `min_tokens=1`.
|
||||
@@ -43,7 +44,7 @@ Our design diagram is shown below, illustrating the pull and push schemes respec
|
||||
4. The Proxy calls `select_decoder` to choose a D node and forwards the request.
|
||||
5. On the D node, the scheduler marks the request as `RequestStatus.WAITING_FOR_REMOTE_KVS`, pre-allocates KV cache, calls `kv_connector_no_forward` to pull the remote KV cache, then notifies the P node to release KV cache and proceeds with decoding to return the result.
|
||||
|
||||
#### Mooncake Layerwise Connector:
|
||||
#### Mooncake Layerwise Connector
|
||||
|
||||
1. The request is sent to the Proxy’s `_handle_completions` endpoint.
|
||||
2. The Proxy calls `select_decoder` to choose a D node and forwards the request, configuring `kv_transfer_params` with `do_remote_prefill=True` and setting the `metaserver` endpoint.
|
||||
@@ -55,6 +56,7 @@ Our design diagram is shown below, illustrating the pull and push schemes respec
|
||||
### 3. Interface Design
|
||||
|
||||
Taking MooncakeConnector as an example, the system is organized into three primary classes:
|
||||
|
||||
- **MooncakeConnector**: Base class that provides core interfaces.
|
||||
- **MooncakeConnectorScheduler**: Interface for scheduling the connectors within the engine core, responsible for managing KV cache transfer requirements and completion.
|
||||
- **MooncakeConnectorWorker**: Interface for managing KV cache registration and transfer in worker processes.
|
||||
|
||||
Reference in New Issue
Block a user