[PD] Raise error for incompatible mooncake version and some minor fixes (#7527)

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
This commit is contained in:
Shangming Cai
2025-06-26 09:55:24 +08:00
committed by GitHub
parent b8df43ab9c
commit 5c2142579a
3 changed files with 55 additions and 49 deletions

View File

@@ -56,7 +56,7 @@ PD Disaggregation with Mooncake supports the following environment variables for
|:--------:|:-----------:|:--------:
| **`SGLANG_DISAGGREGATION_THREAD_POOL_SIZE`** | Controls the total number of worker threads for KVCache transfer operations per TP rank | A dynamic value calculated by `int(0.75 * os.cpu_count()) // 8)`, which is limited to be larger than 4 and less than 12 to ensure efficiency and prevent thread race conditions |
| **`SGLANG_DISAGGREGATION_QUEUE_SIZE`** | Sets the number of parallel transfer queues. KVCache transfer requests from multiple decode instances will be sharded into these queues so that they can share the threads and the transfer bandwidth at the same time. If it is set to `1`, then we transfer requests one by one according to fcfs strategy | `4` |
| **`SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT`** | Timeout (seconds) for receiving destination KV indices during request initialization | `30` |
| **`SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT`** | Timeout (seconds) for receiving destination KV indices during request initialization | `120` |
#### Decode Server Configuration
| Variable | Description | Default |