Commit Graph

276 Commits

Author SHA1 Message Date
Lianmin Zheng
dc0705a504 Simplify prepare_extend_after_decode (#6987) 2025-06-09 16:39:21 -07:00
ishandhanani
f1569876d5 feat: add direct routing strategy to DP worker (#6884) 2025-06-09 11:44:05 -07:00
fzyzcjy
d5c097a2f9 Tiny re-introduce profile id logging (#6912) 2025-06-07 02:32:50 -07:00
Lianmin Zheng
e6b7053b60 Fix a bug in abort & Improve docstrings for abort (#6931) 2025-06-06 14:35:45 -07:00
fzyzcjy
bcf66ef3e1 Tiny allow profiler API to auto create directory (#6865) 2025-06-05 00:07:03 -07:00
ishandhanani
f0f84975f4 feat: add dp-rank to KV events (#6852) 2025-06-04 15:29:34 -07:00
fzyzcjy
ef21729c1d Fix profiles do not have consistent names (#6811) 2025-06-02 11:17:22 -07:00
fzyzcjy
6376b632eb Tiny log prefill time (#6780) 2025-06-02 10:28:27 -07:00
Lianmin Zheng
20fd53b8f6 Correctly abort the failed grammar requests & Improve the handling of abort (#6803) 2025-06-01 19:00:07 -07:00
Lianmin Zheng
2d72fc47cf Improve profiler and integrate profiler in bench_one_batch_server (#6787) 2025-05-31 15:53:55 -07:00
Liangsheng Yin
78689d3393 PD Rust LB (PO2) (#6437)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-05-29 20:50:10 +08:00
fzyzcjy
8c7279c24e Fix profiling will crash the server when using num_steps (#6586) 2025-05-25 22:36:02 -07:00
fzyzcjy
0d47788025 Support overlapping two batches (#4068) 2025-05-24 17:39:07 -07:00
Byron Hsu
2d831c6ef9 [PD] Support structured output (#6560) 2025-05-23 21:49:00 -07:00
Byron Hsu
8233cc10fd [PD] Support logprob & Add failure test (#6558) 2025-05-23 14:29:20 -07:00
Byron Hsu
d2e0881a34 [PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-23 12:03:05 -07:00
Chang Su
4685fbb888 [VLM] Support chunk prefill for VLM (#6355)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-22 20:32:41 -07:00
Byron Hsu
0a4fc73b48 [PD] Fix failure abort (#6535) 2025-05-22 20:32:03 -07:00
Byron Hsu
3bde101099 [PD] Abort request if transfer fails (#6504) 2025-05-21 21:44:25 -07:00
fzyzcjy
f0653886a5 Expert distribution recording without overhead for EPLB (#4957) 2025-05-19 20:07:43 -07:00
Trevor Morris
7adf245ba2 [Metrics] Add KV events publishing (#6098) 2025-05-19 14:19:54 -07:00
fzyzcjy
01d2838c0f Fix stop_profile does not wait for finishing (#4741) 2025-05-17 17:06:15 -07:00
Lifu Huang
3cf1473a09 Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-17 16:49:18 -07:00
Lianmin Zheng
e07a6977e7 Minor improvements of TokenizerManager / health check (#6327) 2025-05-15 15:29:25 -07:00
Cheng Wan
b2e95f62b4 Fix two issues related to --moe-dense-tp-size=1 (#5657)
Co-authored-by: liusy58 <liusy58@linux.alibaba.com>
Co-authored-by: 颉沆 <xiehang.lsy@alibaba-inc.com>
2025-05-12 23:51:39 -07:00
Lianmin Zheng
d18c6b3358 Support incremental streaming of logprob/token_ids between scheduler and detokenizer (#6225)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-12 14:33:38 -07:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
Ying Sheng
bad7c26fdc [PP] Fix init_memory_pool desync & add PP for mixtral (#6223) 2025-05-12 12:38:09 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
Lianmin Zheng
fba8eccd7e Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-12 00:17:33 -07:00
Lianmin Zheng
01bdbf7f80 Improve structured outputs: fix race condition, server crash, metrics and style (#6188) 2025-05-11 08:36:16 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
Lianmin Zheng
de167cf5fa Fix request abortion (#6184) 2025-05-10 21:54:46 -07:00
fzyzcjy
cef91b1ed7 [PD] Add control to slow down a server (#5572) 2025-05-08 01:03:08 -07:00
fzyzcjy
b6cf3532b5 Tiny refactor ModelConfig.from_server_args (#5219) 2025-05-08 01:02:43 -07:00
Liangsheng Yin
a3e4e9bf9e Better PD initialization (#5751) 2025-05-07 01:12:57 +08:00
Zhiqiang Xie
f8e460930a Fix prefill OOM error in the case of large page size (#5081) 2025-05-05 16:02:55 -07:00
xm:D
3409aaab32 Support InternVL3 (#5350)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-05-01 22:38:59 -07:00
Ying Sheng
11383cec3c [PP] Add pipeline parallelism (#5724) 2025-04-30 18:18:07 -07:00
Chang Su
28b26dbf48 [Bugfix]: fix missing queue_time_start for requests from grammar_queue (#5696) 2025-04-29 17:31:44 -07:00
Lianmin Zheng
3029889cb4 Turn on overlap scheduler for multimodal models (#5771) 2025-04-27 23:45:09 -07:00
Liangsheng Yin
40d9b8acce Improve overlap scheduling (#5788) 2025-04-28 11:19:16 +08:00
IAN
11e27d0926 [PD]: Support Muti Prefill in one node (#5704)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
2025-04-26 00:30:47 +08:00
Liangsheng Yin
c55550cbf0 [PD] Better logs (#5715) 2025-04-25 17:25:45 +08:00
Byron Hsu
bf98d2e377 [PD] Support prefill overlap + Ensure no race condition (#5609) 2025-04-21 12:12:56 -07:00
Byron Hsu
e65b9f21e3 [PD] Support decode overlap schedule (#5608) 2025-04-21 12:06:16 -07:00
Zhiqiang Xie
70645f4d7d upstream hicache fixes (#5570) 2025-04-20 23:08:30 -07:00
fzyzcjy
1195182040 Tiny add Engine.flush_cache API (#5241) 2025-04-20 18:15:03 -07:00
fzyzcjy
f6a71139a8 Make profiler output file names consistent (#5548) 2025-04-18 22:57:11 -07:00
Cheng Wan
6aca583420 Fix several minor issues in PD disaggregation (#5444) 2025-04-15 23:04:41 -07:00