Stefan He
|
00fbd8a484
|
Fix typo of flash_cache (#7513)
|
2025-06-25 02:04:41 -07:00 |
|
zixuanzhang226
|
f3cbd24541
|
feat: send kvmetrics from sglang scheduler (#6721)
|
2025-06-25 01:57:49 -07:00 |
|
DangKai
|
bc2e5645c4
|
fix: force synchronization between TP workers when update_weights (#6626)
Co-authored-by: dangkai.dk <dangkai.dk@alibaba-inc.com>
|
2025-06-25 01:35:59 -07:00 |
|
u4lr451
|
ed0a0b692c
|
Perormance: Enable cuda graph for dp idle batch (#7269)
Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-06-23 17:34:13 -07:00 |
|
Lianmin Zheng
|
55e03b10c4
|
Fix a bug in BatchTokenIDOut & Misc style and dependency updates (#7457)
|
2025-06-23 06:20:39 -07:00 |
|
fzyzcjy
|
edc21cc8ae
|
Tiny add logging for GC (#7406)
|
2025-06-22 12:40:02 +08:00 |
|
Liangsheng Yin
|
05c9bc8956
|
[minor] simplify the TokenToKVPoolAllocator (#7414)
|
2025-06-22 12:37:18 +08:00 |
|
Cheng Wan
|
5041df2d01
|
Fix 7285 Merge Conflicts (#7403)
|
2025-06-20 16:02:50 -07:00 |
|
Cheng Wan
|
73b13e69b4
|
Optimize DP attn scheduling for speculative decoding (#7285)
|
2025-06-20 15:06:41 -07:00 |
|
Cheng Wan
|
e879d8b7a8
|
[Feature] Comprehensive Hybrid Parallelism Support (#6389)
|
2025-06-20 14:43:11 -07:00 |
|
strgrb
|
ceba0ce4f6
|
support return logprobs for pipeline (#7356)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
|
2025-06-19 23:50:45 -07:00 |
|
Huang Long
|
1d6515ef2a
|
[Bugfix]Fix hang bug using dp attention with HiRadixCache (#7159)
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
|
2025-06-19 20:34:36 -07:00 |
|
Atream
|
4f838c09cd
|
[PD] Transfer hidden states for mtp when disaggregation (#7242)
|
2025-06-19 11:22:47 -07:00 |
|
DarkSharpness
|
47367b768d
|
[Refactor] Clean up radix cache related API (#7303)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-06-20 00:58:48 +08:00 |
|
Stefan He
|
3774f07825
|
Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (#7099)
|
2025-06-19 00:56:37 -07:00 |
|
fzyzcjy
|
9c6a0656a3
|
Fix profiler error when there are idle passes (#7003)
|
2025-06-18 10:55:01 -07:00 |
|
Zhiqiang Xie
|
e56685ac1b
|
Upstreaming hicache bug fixes (#7267)
|
2025-06-17 17:44:57 -07:00 |
|
shangmingc
|
c26d7349d3
|
[PD] Add custom memory pool option to support Mooncake PD with NVLink (#7264)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-06-17 17:21:37 -07:00 |
|
u4lr451
|
10d60cd41b
|
feat: mtp support dp-attention (#6081)
Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-06-17 00:33:28 -07:00 |
|
woodx
|
e30ef368ab
|
Feat/support rerank (#6058)
|
2025-06-16 10:50:01 -07:00 |
|
Liangsheng Yin
|
c494386728
|
minor fix (#7245)
|
2025-06-16 23:30:26 +08:00 |
|
Byron Hsu
|
88f9c347b2
|
[PD] use int32 for kv indices & get num_reserved_decode_tokens from server_args (#7214)
|
2025-06-15 11:51:03 -07:00 |
|
Lianmin Zheng
|
38af4f68a9
|
Fix grammar abort & Minor style fixes (#7204)
|
2025-06-14 22:49:41 -07:00 |
|
Byron Hsu
|
db0cc57e75
|
[PD] Support decode retract and update decode.py (#7196)
|
2025-06-14 19:48:05 -07:00 |
|
Byron Hsu
|
7d316991b2
|
[PD] Update prefill.py (#7190)
|
2025-06-14 15:59:54 -07:00 |
|
Povilas Kanapickas
|
bd7cfbd2f8
|
[Fix] Reduce busy polling when scheduler is idle (#6026)
|
2025-06-12 14:58:22 -07:00 |
|
Liangsheng Yin
|
930746d93c
|
Improve log status (#7115)
|
2025-06-12 14:38:24 +08:00 |
|
Lianmin Zheng
|
dc0705a504
|
Simplify prepare_extend_after_decode (#6987)
|
2025-06-09 16:39:21 -07:00 |
|
ishandhanani
|
f1569876d5
|
feat: add direct routing strategy to DP worker (#6884)
|
2025-06-09 11:44:05 -07:00 |
|
fzyzcjy
|
d5c097a2f9
|
Tiny re-introduce profile id logging (#6912)
|
2025-06-07 02:32:50 -07:00 |
|
Lianmin Zheng
|
e6b7053b60
|
Fix a bug in abort & Improve docstrings for abort (#6931)
|
2025-06-06 14:35:45 -07:00 |
|
fzyzcjy
|
bcf66ef3e1
|
Tiny allow profiler API to auto create directory (#6865)
|
2025-06-05 00:07:03 -07:00 |
|
ishandhanani
|
f0f84975f4
|
feat: add dp-rank to KV events (#6852)
|
2025-06-04 15:29:34 -07:00 |
|
fzyzcjy
|
ef21729c1d
|
Fix profiles do not have consistent names (#6811)
|
2025-06-02 11:17:22 -07:00 |
|
fzyzcjy
|
6376b632eb
|
Tiny log prefill time (#6780)
|
2025-06-02 10:28:27 -07:00 |
|
Lianmin Zheng
|
20fd53b8f6
|
Correctly abort the failed grammar requests & Improve the handling of abort (#6803)
|
2025-06-01 19:00:07 -07:00 |
|
Lianmin Zheng
|
2d72fc47cf
|
Improve profiler and integrate profiler in bench_one_batch_server (#6787)
|
2025-05-31 15:53:55 -07:00 |
|
Liangsheng Yin
|
78689d3393
|
PD Rust LB (PO2) (#6437)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
|
2025-05-29 20:50:10 +08:00 |
|
fzyzcjy
|
8c7279c24e
|
Fix profiling will crash the server when using num_steps (#6586)
|
2025-05-25 22:36:02 -07:00 |
|
fzyzcjy
|
0d47788025
|
Support overlapping two batches (#4068)
|
2025-05-24 17:39:07 -07:00 |
|
Byron Hsu
|
2d831c6ef9
|
[PD] Support structured output (#6560)
|
2025-05-23 21:49:00 -07:00 |
|
Byron Hsu
|
8233cc10fd
|
[PD] Support logprob & Add failure test (#6558)
|
2025-05-23 14:29:20 -07:00 |
|
Byron Hsu
|
d2e0881a34
|
[PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-23 12:03:05 -07:00 |
|
Chang Su
|
4685fbb888
|
[VLM] Support chunk prefill for VLM (#6355)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-22 20:32:41 -07:00 |
|
Byron Hsu
|
0a4fc73b48
|
[PD] Fix failure abort (#6535)
|
2025-05-22 20:32:03 -07:00 |
|
Byron Hsu
|
3bde101099
|
[PD] Abort request if transfer fails (#6504)
|
2025-05-21 21:44:25 -07:00 |
|
fzyzcjy
|
f0653886a5
|
Expert distribution recording without overhead for EPLB (#4957)
|
2025-05-19 20:07:43 -07:00 |
|
Trevor Morris
|
7adf245ba2
|
[Metrics] Add KV events publishing (#6098)
|
2025-05-19 14:19:54 -07:00 |
|
fzyzcjy
|
01d2838c0f
|
Fix stop_profile does not wait for finishing (#4741)
|
2025-05-17 17:06:15 -07:00 |
|
Lifu Huang
|
3cf1473a09
|
Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
|
2025-05-17 16:49:18 -07:00 |
|