Commit Graph

303 Commits

Author SHA1 Message Date
Stefan He
00fbd8a484 Fix typo of flash_cache (#7513) 2025-06-25 02:04:41 -07:00
zixuanzhang226
f3cbd24541 feat: send kvmetrics from sglang scheduler (#6721) 2025-06-25 01:57:49 -07:00
DangKai
bc2e5645c4 fix: force synchronization between TP workers when update_weights (#6626)
Co-authored-by: dangkai.dk <dangkai.dk@alibaba-inc.com>
2025-06-25 01:35:59 -07:00
u4lr451
ed0a0b692c Perormance: Enable cuda graph for dp idle batch (#7269)
Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>
2025-06-23 17:34:13 -07:00
Lianmin Zheng
55e03b10c4 Fix a bug in BatchTokenIDOut & Misc style and dependency updates (#7457) 2025-06-23 06:20:39 -07:00
fzyzcjy
edc21cc8ae Tiny add logging for GC (#7406) 2025-06-22 12:40:02 +08:00
Liangsheng Yin
05c9bc8956 [minor] simplify the TokenToKVPoolAllocator (#7414) 2025-06-22 12:37:18 +08:00
Cheng Wan
5041df2d01 Fix 7285 Merge Conflicts (#7403) 2025-06-20 16:02:50 -07:00
Cheng Wan
73b13e69b4 Optimize DP attn scheduling for speculative decoding (#7285) 2025-06-20 15:06:41 -07:00
Cheng Wan
e879d8b7a8 [Feature] Comprehensive Hybrid Parallelism Support (#6389) 2025-06-20 14:43:11 -07:00
strgrb
ceba0ce4f6 support return logprobs for pipeline (#7356)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
2025-06-19 23:50:45 -07:00
Huang Long
1d6515ef2a [Bugfix]Fix hang bug using dp attention with HiRadixCache (#7159)
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
2025-06-19 20:34:36 -07:00
Atream
4f838c09cd [PD] Transfer hidden states for mtp when disaggregation (#7242) 2025-06-19 11:22:47 -07:00
DarkSharpness
47367b768d [Refactor] Clean up radix cache related API (#7303)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-06-20 00:58:48 +08:00
Stefan He
3774f07825 Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (#7099) 2025-06-19 00:56:37 -07:00
fzyzcjy
9c6a0656a3 Fix profiler error when there are idle passes (#7003) 2025-06-18 10:55:01 -07:00
Zhiqiang Xie
e56685ac1b Upstreaming hicache bug fixes (#7267) 2025-06-17 17:44:57 -07:00
shangmingc
c26d7349d3 [PD] Add custom memory pool option to support Mooncake PD with NVLink (#7264)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-06-17 17:21:37 -07:00
u4lr451
10d60cd41b feat: mtp support dp-attention (#6081)
Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>
2025-06-17 00:33:28 -07:00
woodx
e30ef368ab Feat/support rerank (#6058) 2025-06-16 10:50:01 -07:00
Liangsheng Yin
c494386728 minor fix (#7245) 2025-06-16 23:30:26 +08:00
Byron Hsu
88f9c347b2 [PD] use int32 for kv indices & get num_reserved_decode_tokens from server_args (#7214) 2025-06-15 11:51:03 -07:00
Lianmin Zheng
38af4f68a9 Fix grammar abort & Minor style fixes (#7204) 2025-06-14 22:49:41 -07:00
Byron Hsu
db0cc57e75 [PD] Support decode retract and update decode.py (#7196) 2025-06-14 19:48:05 -07:00
Byron Hsu
7d316991b2 [PD] Update prefill.py (#7190) 2025-06-14 15:59:54 -07:00
Povilas Kanapickas
bd7cfbd2f8 [Fix] Reduce busy polling when scheduler is idle (#6026) 2025-06-12 14:58:22 -07:00
Liangsheng Yin
930746d93c Improve log status (#7115) 2025-06-12 14:38:24 +08:00
Lianmin Zheng
dc0705a504 Simplify prepare_extend_after_decode (#6987) 2025-06-09 16:39:21 -07:00
ishandhanani
f1569876d5 feat: add direct routing strategy to DP worker (#6884) 2025-06-09 11:44:05 -07:00
fzyzcjy
d5c097a2f9 Tiny re-introduce profile id logging (#6912) 2025-06-07 02:32:50 -07:00
Lianmin Zheng
e6b7053b60 Fix a bug in abort & Improve docstrings for abort (#6931) 2025-06-06 14:35:45 -07:00
fzyzcjy
bcf66ef3e1 Tiny allow profiler API to auto create directory (#6865) 2025-06-05 00:07:03 -07:00
ishandhanani
f0f84975f4 feat: add dp-rank to KV events (#6852) 2025-06-04 15:29:34 -07:00
fzyzcjy
ef21729c1d Fix profiles do not have consistent names (#6811) 2025-06-02 11:17:22 -07:00
fzyzcjy
6376b632eb Tiny log prefill time (#6780) 2025-06-02 10:28:27 -07:00
Lianmin Zheng
20fd53b8f6 Correctly abort the failed grammar requests & Improve the handling of abort (#6803) 2025-06-01 19:00:07 -07:00
Lianmin Zheng
2d72fc47cf Improve profiler and integrate profiler in bench_one_batch_server (#6787) 2025-05-31 15:53:55 -07:00
Liangsheng Yin
78689d3393 PD Rust LB (PO2) (#6437)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-05-29 20:50:10 +08:00
fzyzcjy
8c7279c24e Fix profiling will crash the server when using num_steps (#6586) 2025-05-25 22:36:02 -07:00
fzyzcjy
0d47788025 Support overlapping two batches (#4068) 2025-05-24 17:39:07 -07:00
Byron Hsu
2d831c6ef9 [PD] Support structured output (#6560) 2025-05-23 21:49:00 -07:00
Byron Hsu
8233cc10fd [PD] Support logprob & Add failure test (#6558) 2025-05-23 14:29:20 -07:00
Byron Hsu
d2e0881a34 [PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-23 12:03:05 -07:00
Chang Su
4685fbb888 [VLM] Support chunk prefill for VLM (#6355)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-22 20:32:41 -07:00
Byron Hsu
0a4fc73b48 [PD] Fix failure abort (#6535) 2025-05-22 20:32:03 -07:00
Byron Hsu
3bde101099 [PD] Abort request if transfer fails (#6504) 2025-05-21 21:44:25 -07:00
fzyzcjy
f0653886a5 Expert distribution recording without overhead for EPLB (#4957) 2025-05-19 20:07:43 -07:00
Trevor Morris
7adf245ba2 [Metrics] Add KV events publishing (#6098) 2025-05-19 14:19:54 -07:00
fzyzcjy
01d2838c0f Fix stop_profile does not wait for finishing (#4741) 2025-05-17 17:06:15 -07:00
Lifu Huang
3cf1473a09 Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-17 16:49:18 -07:00