[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
bde38c11df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
This commit is contained in:
SILONG ZENG
2026-01-15 09:06:01 +08:00
committed by GitHub
parent 96edd4673f
commit 4811ba62e0
75 changed files with 711 additions and 308 deletions

View File

@@ -9,11 +9,11 @@ This guide shows how to run **prefilldecode (PD) disaggregation** on Huawei A
Large language model inference naturally splits into two phases:
- **Prefill**
- Processes input tokens and builds the keyvalue (KV) cache.
- Batchfriendly, high throughput, well suited to parallel NPU execution.
- Processes input tokens and builds the keyvalue (KV) cache.
- Batchfriendly, high throughput, well suited to parallel NPU execution.
- **Decode**
- Consumes the KV cache to generate output tokens.
- Latencysensitive, memoryintensive, more sequential.
- Consumes the KV cache to generate output tokens.
- Latencysensitive, memoryintensive, more sequential.
From the clients perspective, this still looks like a single Chat / Completions endpoint.
@@ -43,7 +43,7 @@ This section uses the `deepseek-ai/DeepSeek-V2-Lite` example, but you can swap i
### 2.2 Deploy Prefill-Decode Disaggregated DeepSeek-V2-Lite on Kubernetes
A concrete example is provided in Kthena as https://github.com/volcano-sh/kthena/blob/main/examples/model-serving/prefill-decode-disaggregation.yaml
A concrete example is provided in Kthena as <https://github.com/volcano-sh/kthena/blob/main/examples/model-serving/prefill-decode-disaggregation.yaml>
Deploy it with below command: