Byron Hsu
|
8233cc10fd
|
[PD] Support logprob & Add failure test (#6558)
|
2025-05-23 14:29:20 -07:00 |
|
Byron Hsu
|
d2e0881a34
|
[PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-23 12:03:05 -07:00 |
|
shangmingc
|
58f10679e1
|
Fix missing http status import for PD failure handler (#6520)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-22 15:23:54 +08:00 |
|
Byron Hsu
|
3bde101099
|
[PD] Abort request if transfer fails (#6504)
|
2025-05-21 21:44:25 -07:00 |
|
Byron Hsu
|
7513558074
|
[PD] Add doc and simplify sender.send (#6019)
|
2025-05-21 21:22:21 -07:00 |
|
shangmingc
|
f1c896007a
|
[PD] Add support for different TP sizes per DP rank (#5922)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-12 13:55:42 -07:00 |
|
Lianmin Zheng
|
fba8eccd7e
|
Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 00:17:33 -07:00 |
|
ybyang
|
c6c6264073
|
[PD] support pd fake transfer for warmup (#5726)
|
2025-04-29 00:33:20 +08:00 |
|
Liangsheng Yin
|
40d9b8acce
|
Improve overlap scheduling (#5788)
|
2025-04-28 11:19:16 +08:00 |
|
Liangsheng Yin
|
c55550cbf0
|
[PD] Better logs (#5715)
|
2025-04-25 17:25:45 +08:00 |
|
Cheng Wan
|
711efe7814
|
Integrating PD disaggregation with DP attention and DeepEP (#5435)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2025-04-23 01:46:01 -07:00 |
|
Byron Hsu
|
bf98d2e377
|
[PD] Support prefill overlap + Ensure no race condition (#5609)
|
2025-04-21 12:12:56 -07:00 |
|
Byron Hsu
|
deded17f38
|
[PD] Fix edge case and simplify large page size + chunked prefill (#5589)
|
2025-04-21 10:27:02 -07:00 |
|
Byron Hsu
|
c951d312ed
|
[PD] Fix large page size + chunk prefill (#5588)
|
2025-04-20 17:21:54 -07:00 |
|
Byron Hsu
|
ab4b5606e4
|
[PD] Support page size > 1 (#5561)
|
2025-04-19 21:54:27 -07:00 |
|
shangmingc
|
dca90f1db8
|
[PD] Remove the requirement of config file for mooncake backend (#5460)
|
2025-04-19 19:31:00 +08:00 |
|
Cheng Wan
|
6aca583420
|
Fix several minor issues in PD disaggregation (#5444)
|
2025-04-15 23:04:41 -07:00 |
|
Liangsheng Yin
|
33b16ad178
|
Distinguish bootstrap key only in decode server (#5422)
|
2025-04-15 20:59:28 +08:00 |
|
shangmingc
|
ffde65a094
|
[PD] Fix dynamic port support and MLA buffer for Mooncake (#5415)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-04-15 19:29:31 +08:00 |
|
Liangsheng Yin
|
44afde82d7
|
Fix PD disaggregation bugs (#5326)
|
2025-04-14 19:27:30 +08:00 |
|
Yongtong Wu
|
14e8bd889f
|
Free metadata_buffer_index after transfer finished (#5364)
|
2025-04-14 16:04:46 +08:00 |
|
Byron Hsu
|
a9499885e9
|
[PD] Add transfer backend abstraction (#5328)
|
2025-04-14 01:39:39 +08:00 |
|
Liangsheng Yin
|
f765579046
|
Fix typo: infight -> inflight (#5357)
|
2025-04-14 01:25:30 +08:00 |
|
Teng Ma
|
4c31ae9f6d
|
[PD] Support KV transfer with mooncake (#4880)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: shangmingc <csmthu@gmail.com>
|
2025-04-10 14:23:23 +08:00 |
|
Byron Hsu
|
c7c7dbebbe
|
[PD] Release initial code (#4654)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Ying1123 <sqy1415@gmail.com>
Co-authored-by: merrymercy <lianminzheng@gmail.com>
Co-authored-by: makro
Co-authored-by: dhou-xai
|
2025-03-21 14:47:47 -07:00 |
|