[Doc] fix the nit in docs (#6826)
Refresh the doc, fix the nit in the docs
- vLLM version: v0.15.0
- vLLM main:
83b47f67b1
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -79,9 +79,9 @@ After computing the results with the local KV cache, the results are updated via
|
||||
|
||||
**Tokens Partition in Head-Tail Style**
|
||||
|
||||
PCP requires splitting the input sequence and ensure balanced computational load across devices during the prefill phase.
|
||||
PCP requires splitting the input sequence and ensuring balanced computational load across devices during the prefill phase.
|
||||
We employ a head-tail style for splitting and concatenation: specifically, the sequence is first padded to a length of `2*pcp_size`, then divided into `2*pcp_size` equal parts.
|
||||
The first part is merged with the last part, the second part with the second last part, and so on, thereby assigning computationally balanced chunks to each devices.
|
||||
The first part is merged with the last part, the second part with the second last part, and so on, thereby assigning computationally balanced chunks to each device.
|
||||
Additionally, since allgather aggregation of KV or Q results in interleaved chunks from different requests, we compute `pcp_allgather_restore_idx` to quickly restore the original order.
|
||||
|
||||
These logics are implemented in the function `_update_tokens_for_pcp`.
|
||||
|
||||
Reference in New Issue
Block a user