Commit Graph

25 Commits

Author SHA1 Message Date
Byron Hsu
8233cc10fd [PD] Support logprob & Add failure test (#6558) 2025-05-23 14:29:20 -07:00
Byron Hsu
d2e0881a34 [PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-23 12:03:05 -07:00
shangmingc
58f10679e1 Fix missing http status import for PD failure handler (#6520)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-22 15:23:54 +08:00
Byron Hsu
3bde101099 [PD] Abort request if transfer fails (#6504) 2025-05-21 21:44:25 -07:00
Byron Hsu
7513558074 [PD] Add doc and simplify sender.send (#6019) 2025-05-21 21:22:21 -07:00
shangmingc
f1c896007a [PD] Add support for different TP sizes per DP rank (#5922)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-12 13:55:42 -07:00
Lianmin Zheng
fba8eccd7e Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-12 00:17:33 -07:00
ybyang
c6c6264073 [PD] support pd fake transfer for warmup (#5726) 2025-04-29 00:33:20 +08:00
Liangsheng Yin
40d9b8acce Improve overlap scheduling (#5788) 2025-04-28 11:19:16 +08:00
Liangsheng Yin
c55550cbf0 [PD] Better logs (#5715) 2025-04-25 17:25:45 +08:00
Cheng Wan
711efe7814 Integrating PD disaggregation with DP attention and DeepEP (#5435)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2025-04-23 01:46:01 -07:00
Byron Hsu
bf98d2e377 [PD] Support prefill overlap + Ensure no race condition (#5609) 2025-04-21 12:12:56 -07:00
Byron Hsu
deded17f38 [PD] Fix edge case and simplify large page size + chunked prefill (#5589) 2025-04-21 10:27:02 -07:00
Byron Hsu
c951d312ed [PD] Fix large page size + chunk prefill (#5588) 2025-04-20 17:21:54 -07:00
Byron Hsu
ab4b5606e4 [PD] Support page size > 1 (#5561) 2025-04-19 21:54:27 -07:00
shangmingc
dca90f1db8 [PD] Remove the requirement of config file for mooncake backend (#5460) 2025-04-19 19:31:00 +08:00
Cheng Wan
6aca583420 Fix several minor issues in PD disaggregation (#5444) 2025-04-15 23:04:41 -07:00
Liangsheng Yin
33b16ad178 Distinguish bootstrap key only in decode server (#5422) 2025-04-15 20:59:28 +08:00
shangmingc
ffde65a094 [PD] Fix dynamic port support and MLA buffer for Mooncake (#5415)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: ybyang <ybyang7@iflytek.com>
2025-04-15 19:29:31 +08:00
Liangsheng Yin
44afde82d7 Fix PD disaggregation bugs (#5326) 2025-04-14 19:27:30 +08:00
Yongtong Wu
14e8bd889f Free metadata_buffer_index after transfer finished (#5364) 2025-04-14 16:04:46 +08:00
Byron Hsu
a9499885e9 [PD] Add transfer backend abstraction (#5328) 2025-04-14 01:39:39 +08:00
Liangsheng Yin
f765579046 Fix typo: infight -> inflight (#5357) 2025-04-14 01:25:30 +08:00
Teng Ma
4c31ae9f6d [PD] Support KV transfer with mooncake (#4880)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: shangmingc <csmthu@gmail.com>
2025-04-10 14:23:23 +08:00
Byron Hsu
c7c7dbebbe [PD] Release initial code (#4654)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Ying1123 <sqy1415@gmail.com>
Co-authored-by: merrymercy <lianminzheng@gmail.com>
Co-authored-by: makro
Co-authored-by: dhou-xai
2025-03-21 14:47:47 -07:00