Commit Graph

51 Commits

Author SHA1 Message Date
Byron Hsu
d2e0881a34 [PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-23 12:03:05 -07:00
Byron Hsu
0a4fc73b48 [PD] Fix failure abort (#6535) 2025-05-22 20:32:03 -07:00
shangmingc
58f10679e1 Fix missing http status import for PD failure handler (#6520)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-22 15:23:54 +08:00
Byron Hsu
3bde101099 [PD] Abort request if transfer fails (#6504) 2025-05-21 21:44:25 -07:00
Byron Hsu
7513558074 [PD] Add doc and simplify sender.send (#6019) 2025-05-21 21:22:21 -07:00
Yuan Luo
30ca18f423 Refactor group_concurrent_contiguous in NIXL (#6214)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-05-21 11:55:04 +08:00
Trevor Morris
7adf245ba2 [Metrics] Add KV events publishing (#6098) 2025-05-19 14:19:54 -07:00
Lianmin Zheng
e07a6977e7 Minor improvements of TokenizerManager / health check (#6327) 2025-05-15 15:29:25 -07:00
shangmingc
f1c896007a [PD] Add support for different TP sizes per DP rank (#5922)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-12 13:55:42 -07:00
Lianmin Zheng
fba8eccd7e Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-12 00:17:33 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
Liangsheng Yin
a3e4e9bf9e Better PD initialization (#5751) 2025-05-07 01:12:57 +08:00
Liangsheng Yin
6d4d3bc81d Fix not "import os" (#6057) 2025-05-06 22:06:41 +08:00
fzyzcjy
3008db9c1a [PD] Allow customizing reserved tokens to avoid KV cache waste (#6002) 2025-05-05 11:23:15 +08:00
Yongtong Wu
97ac42b634 [PD] NIXL backend Prefill TP & Decode TP+DP (#5681) 2025-05-02 22:14:03 +08:00
Yuan Luo
67b7d5b1df [PD] Vectorise group_concurrent_contiguous in NumPy (#5834)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-05-01 22:42:37 +08:00
ybyang
c6c6264073 [PD] support pd fake transfer for warmup (#5726) 2025-04-29 00:33:20 +08:00
Liangsheng Yin
40d9b8acce Improve overlap scheduling (#5788) 2025-04-28 11:19:16 +08:00
Liangsheng Yin
beb65c7433 [PD]Reduce kv transfer threads (#5791) 2025-04-27 23:03:30 +08:00
IAN
11e27d0926 [PD]: Support Muti Prefill in one node (#5704)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
2025-04-26 00:30:47 +08:00
shangmingc
50eda8398e [PD] Add kvargs table and thread pool for kvcache sender of mooncake (#5738)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-04-25 18:15:01 +08:00
Liangsheng Yin
c55550cbf0 [PD] Better logs (#5715) 2025-04-25 17:25:45 +08:00
shangmingc
e0673969b9 [PD] Add support for dp attention with mooncake (#5530)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-04-23 17:20:27 +08:00
Cheng Wan
711efe7814 Integrating PD disaggregation with DP attention and DeepEP (#5435)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2025-04-23 01:46:01 -07:00
Byron Hsu
bf98d2e377 [PD] Support prefill overlap + Ensure no race condition (#5609) 2025-04-21 12:12:56 -07:00
Byron Hsu
e65b9f21e3 [PD] Support decode overlap schedule (#5608) 2025-04-21 12:06:16 -07:00
Trevor Morris
4dce1cc608 [PD] Add NIXL transfer backend (#5477) 2025-04-22 01:36:12 +08:00
Byron Hsu
deded17f38 [PD] Fix edge case and simplify large page size + chunked prefill (#5589) 2025-04-21 10:27:02 -07:00
shangmingc
f29a718f63 [PD] Fix generate endpoint of min_lb for PD (#5598)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-04-21 21:39:18 +08:00
Yongtong Wu
3f57b00a59 Support PD bootstrap fields on /v1/chat/completions endpoint (#5488) 2025-04-21 01:10:58 -07:00
Byron Hsu
c951d312ed [PD] Fix large page size + chunk prefill (#5588) 2025-04-20 17:21:54 -07:00
fzyzcjy
475e2e378a [PD] Fix server crash when using batch requests (#5531) 2025-04-20 16:02:23 -07:00
Byron Hsu
ab4b5606e4 [PD] Support page size > 1 (#5561) 2025-04-19 21:54:27 -07:00
shangmingc
dca90f1db8 [PD] Remove the requirement of config file for mooncake backend (#5460) 2025-04-19 19:31:00 +08:00
ybyang
59dd090f1c [PD] Fix no cache connect for recevier (#5534) 2025-04-19 14:55:28 +08:00
fzyzcjy
569b032c58 [PD] Tiny fix timeout error when generate (#5545) 2025-04-19 14:42:57 +08:00
Cheng Wan
6aca583420 Fix several minor issues in PD disaggregation (#5444) 2025-04-15 23:04:41 -07:00
shangmingc
f1b3b75fc6 [PD] Remove unused bootstrap param and fix port table type (#5423) 2025-04-15 21:21:20 +08:00
Liangsheng Yin
33b16ad178 Distinguish bootstrap key only in decode server (#5422) 2025-04-15 20:59:28 +08:00
shangmingc
ffde65a094 [PD] Fix dynamic port support and MLA buffer for Mooncake (#5415)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: ybyang <ybyang7@iflytek.com>
2025-04-15 19:29:31 +08:00
lambert0312
471650dee0 Fix broadcast use cuda device lead to memory capacity unbalanced (#5416) 2025-04-15 02:47:26 -07:00
Yuan Luo
d06a83fb01 Support dynamic connection and TP 16 (#5351)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-04-15 17:08:07 +08:00
Liangsheng Yin
44afde82d7 Fix PD disaggregation bugs (#5326) 2025-04-14 19:27:30 +08:00
Yongtong Wu
14e8bd889f Free metadata_buffer_index after transfer finished (#5364) 2025-04-14 16:04:46 +08:00
Byron Hsu
a9499885e9 [PD] Add transfer backend abstraction (#5328) 2025-04-14 01:39:39 +08:00
Liangsheng Yin
f765579046 Fix typo: infight -> inflight (#5357) 2025-04-14 01:25:30 +08:00
Teng Ma
4c31ae9f6d [PD] Support KV transfer with mooncake (#4880)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: shangmingc <csmthu@gmail.com>
2025-04-10 14:23:23 +08:00
Byron Hsu
6d3b35fae9 [PD] Simplify mini LB (#4911)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2025-04-08 09:42:34 -07:00
shangmingc
89a554181f [PD] Fix unclosed prefill connection warning of mini_lb (#5155)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-04-08 09:15:06 -07:00
Xuchun Shang
8154de5a32 [PD] Remove invalid parameter (#4721)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
2025-03-24 13:14:16 -07:00