[Doc] fix the nit in docs (#6826)

Refresh the doc, fix the nit in the docs

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
wangxiyuan
2026-02-27 11:50:27 +08:00
committed by GitHub
parent 981d803cb7
commit a95c0b8b82
30 changed files with 145 additions and 118 deletions

View File

@@ -79,9 +79,9 @@ After computing the results with the local KV cache, the results are updated via
**Tokens Partition in Head-Tail Style**
PCP requires splitting the input sequence and ensure balanced computational load across devices during the prefill phase.
PCP requires splitting the input sequence and ensuring balanced computational load across devices during the prefill phase.
We employ a head-tail style for splitting and concatenation: specifically, the sequence is first padded to a length of `2*pcp_size`, then divided into `2*pcp_size` equal parts.
The first part is merged with the last part, the second part with the second last part, and so on, thereby assigning computationally balanced chunks to each devices.
The first part is merged with the last part, the second part with the second last part, and so on, thereby assigning computationally balanced chunks to each device.
Additionally, since allgather aggregation of KV or Q results in interleaved chunks from different requests, we compute `pcp_allgather_restore_idx` to quickly restore the original order.
These logics are implemented in the function `_update_tokens_for_pcp`.