Byron Hsu
|
d2e0881a34
|
[PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-23 12:03:05 -07:00 |
|
Byron Hsu
|
0a4fc73b48
|
[PD] Fix failure abort (#6535)
|
2025-05-22 20:32:03 -07:00 |
|
shangmingc
|
58f10679e1
|
Fix missing http status import for PD failure handler (#6520)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-22 15:23:54 +08:00 |
|
Byron Hsu
|
3bde101099
|
[PD] Abort request if transfer fails (#6504)
|
2025-05-21 21:44:25 -07:00 |
|
Byron Hsu
|
7513558074
|
[PD] Add doc and simplify sender.send (#6019)
|
2025-05-21 21:22:21 -07:00 |
|
Yuan Luo
|
30ca18f423
|
Refactor group_concurrent_contiguous in NIXL (#6214)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-05-21 11:55:04 +08:00 |
|
Trevor Morris
|
7adf245ba2
|
[Metrics] Add KV events publishing (#6098)
|
2025-05-19 14:19:54 -07:00 |
|
Lianmin Zheng
|
e07a6977e7
|
Minor improvements of TokenizerManager / health check (#6327)
|
2025-05-15 15:29:25 -07:00 |
|
shangmingc
|
f1c896007a
|
[PD] Add support for different TP sizes per DP rank (#5922)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-12 13:55:42 -07:00 |
|
Lianmin Zheng
|
fba8eccd7e
|
Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 00:17:33 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Liangsheng Yin
|
a3e4e9bf9e
|
Better PD initialization (#5751)
|
2025-05-07 01:12:57 +08:00 |
|
Liangsheng Yin
|
6d4d3bc81d
|
Fix not "import os" (#6057)
|
2025-05-06 22:06:41 +08:00 |
|
fzyzcjy
|
3008db9c1a
|
[PD] Allow customizing reserved tokens to avoid KV cache waste (#6002)
|
2025-05-05 11:23:15 +08:00 |
|
Yongtong Wu
|
97ac42b634
|
[PD] NIXL backend Prefill TP & Decode TP+DP (#5681)
|
2025-05-02 22:14:03 +08:00 |
|
Yuan Luo
|
67b7d5b1df
|
[PD] Vectorise group_concurrent_contiguous in NumPy (#5834)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-05-01 22:42:37 +08:00 |
|
ybyang
|
c6c6264073
|
[PD] support pd fake transfer for warmup (#5726)
|
2025-04-29 00:33:20 +08:00 |
|
Liangsheng Yin
|
40d9b8acce
|
Improve overlap scheduling (#5788)
|
2025-04-28 11:19:16 +08:00 |
|
Liangsheng Yin
|
beb65c7433
|
[PD]Reduce kv transfer threads (#5791)
|
2025-04-27 23:03:30 +08:00 |
|
IAN
|
11e27d0926
|
[PD]: Support Muti Prefill in one node (#5704)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-04-26 00:30:47 +08:00 |
|
shangmingc
|
50eda8398e
|
[PD] Add kvargs table and thread pool for kvcache sender of mooncake (#5738)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-04-25 18:15:01 +08:00 |
|
Liangsheng Yin
|
c55550cbf0
|
[PD] Better logs (#5715)
|
2025-04-25 17:25:45 +08:00 |
|
shangmingc
|
e0673969b9
|
[PD] Add support for dp attention with mooncake (#5530)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-04-23 17:20:27 +08:00 |
|
Cheng Wan
|
711efe7814
|
Integrating PD disaggregation with DP attention and DeepEP (#5435)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2025-04-23 01:46:01 -07:00 |
|
Byron Hsu
|
bf98d2e377
|
[PD] Support prefill overlap + Ensure no race condition (#5609)
|
2025-04-21 12:12:56 -07:00 |
|
Byron Hsu
|
e65b9f21e3
|
[PD] Support decode overlap schedule (#5608)
|
2025-04-21 12:06:16 -07:00 |
|
Trevor Morris
|
4dce1cc608
|
[PD] Add NIXL transfer backend (#5477)
|
2025-04-22 01:36:12 +08:00 |
|
Byron Hsu
|
deded17f38
|
[PD] Fix edge case and simplify large page size + chunked prefill (#5589)
|
2025-04-21 10:27:02 -07:00 |
|
shangmingc
|
f29a718f63
|
[PD] Fix generate endpoint of min_lb for PD (#5598)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-04-21 21:39:18 +08:00 |
|
Yongtong Wu
|
3f57b00a59
|
Support PD bootstrap fields on /v1/chat/completions endpoint (#5488)
|
2025-04-21 01:10:58 -07:00 |
|
Byron Hsu
|
c951d312ed
|
[PD] Fix large page size + chunk prefill (#5588)
|
2025-04-20 17:21:54 -07:00 |
|
fzyzcjy
|
475e2e378a
|
[PD] Fix server crash when using batch requests (#5531)
|
2025-04-20 16:02:23 -07:00 |
|
Byron Hsu
|
ab4b5606e4
|
[PD] Support page size > 1 (#5561)
|
2025-04-19 21:54:27 -07:00 |
|
shangmingc
|
dca90f1db8
|
[PD] Remove the requirement of config file for mooncake backend (#5460)
|
2025-04-19 19:31:00 +08:00 |
|
ybyang
|
59dd090f1c
|
[PD] Fix no cache connect for recevier (#5534)
|
2025-04-19 14:55:28 +08:00 |
|
fzyzcjy
|
569b032c58
|
[PD] Tiny fix timeout error when generate (#5545)
|
2025-04-19 14:42:57 +08:00 |
|
Cheng Wan
|
6aca583420
|
Fix several minor issues in PD disaggregation (#5444)
|
2025-04-15 23:04:41 -07:00 |
|
shangmingc
|
f1b3b75fc6
|
[PD] Remove unused bootstrap param and fix port table type (#5423)
|
2025-04-15 21:21:20 +08:00 |
|
Liangsheng Yin
|
33b16ad178
|
Distinguish bootstrap key only in decode server (#5422)
|
2025-04-15 20:59:28 +08:00 |
|
shangmingc
|
ffde65a094
|
[PD] Fix dynamic port support and MLA buffer for Mooncake (#5415)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: ybyang <ybyang7@iflytek.com>
|
2025-04-15 19:29:31 +08:00 |
|
lambert0312
|
471650dee0
|
Fix broadcast use cuda device lead to memory capacity unbalanced (#5416)
|
2025-04-15 02:47:26 -07:00 |
|
Yuan Luo
|
d06a83fb01
|
Support dynamic connection and TP 16 (#5351)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-04-15 17:08:07 +08:00 |
|
Liangsheng Yin
|
44afde82d7
|
Fix PD disaggregation bugs (#5326)
|
2025-04-14 19:27:30 +08:00 |
|
Yongtong Wu
|
14e8bd889f
|
Free metadata_buffer_index after transfer finished (#5364)
|
2025-04-14 16:04:46 +08:00 |
|
Byron Hsu
|
a9499885e9
|
[PD] Add transfer backend abstraction (#5328)
|
2025-04-14 01:39:39 +08:00 |
|
Liangsheng Yin
|
f765579046
|
Fix typo: infight -> inflight (#5357)
|
2025-04-14 01:25:30 +08:00 |
|
Teng Ma
|
4c31ae9f6d
|
[PD] Support KV transfer with mooncake (#4880)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: shangmingc <csmthu@gmail.com>
|
2025-04-10 14:23:23 +08:00 |
|
Byron Hsu
|
6d3b35fae9
|
[PD] Simplify mini LB (#4911)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2025-04-08 09:42:34 -07:00 |
|
shangmingc
|
89a554181f
|
[PD] Fix unclosed prefill connection warning of mini_lb (#5155)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-04-08 09:15:06 -07:00 |
|
Xuchun Shang
|
8154de5a32
|
[PD] Remove invalid parameter (#4721)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
|
2025-03-24 13:14:16 -07:00 |
|