Stefan He
|
3774f07825
|
Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (#7099)
|
2025-06-19 00:56:37 -07:00 |
|
Simo Lin
|
09ae5b20f3
|
Merge PDLB (Prefill-Decode Load Balancer) into SGLang Router (#7096)
|
2025-06-19 02:28:15 +08:00 |
|
ch-tiger1
|
2ae809c5c1
|
Fix mini_lb for PD with long output: limit chunk size of decode response (#7301)
Signed-off-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com>
Co-authored-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com>
|
2025-06-18 10:46:46 -07:00 |
|
shangmingc
|
c26d7349d3
|
[PD] Add custom memory pool option to support Mooncake PD with NVLink (#7264)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-06-17 17:21:37 -07:00 |
|
shangmingc
|
ceaa85c9e6
|
[PD] Support get local ip from NIC for PD disaggregation (#7237)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-06-17 17:19:26 -07:00 |
|
Byron Hsu
|
96be97bfff
|
Minor PD style fix (#7215)
|
2025-06-15 16:12:12 -07:00 |
|
Byron Hsu
|
88f9c347b2
|
[PD] use int32 for kv indices & get num_reserved_decode_tokens from server_args (#7214)
|
2025-06-15 11:51:03 -07:00 |
|
Byron Hsu
|
db0cc57e75
|
[PD] Support decode retract and update decode.py (#7196)
|
2025-06-14 19:48:05 -07:00 |
|
Byron Hsu
|
7d316991b2
|
[PD] Update prefill.py (#7190)
|
2025-06-14 15:59:54 -07:00 |
|
Povilas Kanapickas
|
bd7cfbd2f8
|
[Fix] Reduce busy polling when scheduler is idle (#6026)
|
2025-06-12 14:58:22 -07:00 |
|
Byron Hsu
|
c2b16795b5
|
Add decode req pool (#6980)
|
2025-06-09 21:23:36 -07:00 |
|
ishandhanani
|
f1569876d5
|
feat: add direct routing strategy to DP worker (#6884)
|
2025-06-09 11:44:05 -07:00 |
|
shangmingc
|
132dad874d
|
[PD] Optimize transfer queue forward logic for dummy rank (#6922)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-06-06 18:26:14 -07:00 |
|
shangmingc
|
dd1012fcbe
|
[PD] Fix potential perf spike caused by tracker gc and optimize doc (#6764)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-06-05 10:56:02 -07:00 |
|
ishandhanani
|
f0f84975f4
|
feat: add dp-rank to KV events (#6852)
|
2025-06-04 15:29:34 -07:00 |
|
shangmingc
|
6cb00c6398
|
[PD] Optimize time out logic and add env var doc for mooncake (#6761)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-30 00:45:02 -07:00 |
|
Liangsheng Yin
|
78689d3393
|
PD Rust LB (PO2) (#6437)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
|
2025-05-29 20:50:10 +08:00 |
|
shangmingc
|
1dc6864f17
|
[PD] Support completion endpoint (#6729)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-29 16:26:18 +08:00 |
|
dongmao zhang
|
c459536b0f
|
[PD] bug fix: Update status if nixl receiver send a a dummy req. (#6720)
|
2025-05-29 00:01:56 -07:00 |
|
Hongbo Xu
|
5170b010a6
|
[PD] Remove Unnecessary Exception Handling for FastQueue.get() (#6712)
|
2025-05-28 11:18:24 -07:00 |
|
shangmingc
|
e9fd11c0d1
|
[Bugfix] Fix ChatCompletion endpoint of mini_lb when stream is set (#6703)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-28 21:33:36 +08:00 |
|
shangmingc
|
c7588d593e
|
[Bugfix] Fix slice operation when chunk size mismatch (#6697)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-28 21:15:00 +08:00 |
|
ybyang
|
6b231325b9
|
[PD Perf] replace Queue to FastQueue (#6649)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-28 01:37:51 -07:00 |
|
shangmingc
|
b1c8d4e9f3
|
[PD] Abort unbootstrapped prefill requests through timeout (#6685)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-28 00:40:54 -07:00 |
|
Trevor Morris
|
e806f708c9
|
[PD] Make bootstrap code common between NIXL and Mooncake (#6473)
|
2025-05-27 12:47:38 -07:00 |
|
fzyzcjy
|
1a8f5f6836
|
Super tiny rename environment variable (#6648)
|
2025-05-26 21:01:16 -07:00 |
|
shangmingc
|
3ce94f71f9
|
[PD] Handle P/D failure and reconnect without affecting other instances (#6263)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-26 19:21:01 -07:00 |
|
wangxiyu191
|
8b33d8df90
|
[PD] Fix prefill_servers in mini_lb (#6527)
|
2025-05-26 10:38:41 +08:00 |
|
Byron Hsu
|
2d831c6ef9
|
[PD] Support structured output (#6560)
|
2025-05-23 21:49:00 -07:00 |
|
Byron Hsu
|
8233cc10fd
|
[PD] Support logprob & Add failure test (#6558)
|
2025-05-23 14:29:20 -07:00 |
|
Byron Hsu
|
d2e0881a34
|
[PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-23 12:03:05 -07:00 |
|
Byron Hsu
|
0a4fc73b48
|
[PD] Fix failure abort (#6535)
|
2025-05-22 20:32:03 -07:00 |
|
shangmingc
|
58f10679e1
|
Fix missing http status import for PD failure handler (#6520)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-22 15:23:54 +08:00 |
|
Byron Hsu
|
3bde101099
|
[PD] Abort request if transfer fails (#6504)
|
2025-05-21 21:44:25 -07:00 |
|
Byron Hsu
|
7513558074
|
[PD] Add doc and simplify sender.send (#6019)
|
2025-05-21 21:22:21 -07:00 |
|
Yuan Luo
|
30ca18f423
|
Refactor group_concurrent_contiguous in NIXL (#6214)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-05-21 11:55:04 +08:00 |
|
Trevor Morris
|
7adf245ba2
|
[Metrics] Add KV events publishing (#6098)
|
2025-05-19 14:19:54 -07:00 |
|
Lianmin Zheng
|
e07a6977e7
|
Minor improvements of TokenizerManager / health check (#6327)
|
2025-05-15 15:29:25 -07:00 |
|
shangmingc
|
f1c896007a
|
[PD] Add support for different TP sizes per DP rank (#5922)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-12 13:55:42 -07:00 |
|
Lianmin Zheng
|
fba8eccd7e
|
Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 00:17:33 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Liangsheng Yin
|
a3e4e9bf9e
|
Better PD initialization (#5751)
|
2025-05-07 01:12:57 +08:00 |
|
Liangsheng Yin
|
6d4d3bc81d
|
Fix not "import os" (#6057)
|
2025-05-06 22:06:41 +08:00 |
|
fzyzcjy
|
3008db9c1a
|
[PD] Allow customizing reserved tokens to avoid KV cache waste (#6002)
|
2025-05-05 11:23:15 +08:00 |
|
Yongtong Wu
|
97ac42b634
|
[PD] NIXL backend Prefill TP & Decode TP+DP (#5681)
|
2025-05-02 22:14:03 +08:00 |
|
Yuan Luo
|
67b7d5b1df
|
[PD] Vectorise group_concurrent_contiguous in NumPy (#5834)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-05-01 22:42:37 +08:00 |
|
ybyang
|
c6c6264073
|
[PD] support pd fake transfer for warmup (#5726)
|
2025-04-29 00:33:20 +08:00 |
|
Liangsheng Yin
|
40d9b8acce
|
Improve overlap scheduling (#5788)
|
2025-04-28 11:19:16 +08:00 |
|
Liangsheng Yin
|
beb65c7433
|
[PD]Reduce kv transfer threads (#5791)
|
2025-04-27 23:03:30 +08:00 |
|
IAN
|
11e27d0926
|
[PD]: Support Muti Prefill in one node (#5704)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-04-26 00:30:47 +08:00 |
|