Commit Graph

3891 Commits

Author SHA1 Message Date
ybyang
7349717e4b [doc] update lws doc for pd (#7318) 2025-07-01 10:39:04 +08:00
Yineng Zhang
392e441ad1 chore: upgrade flashinfer v0.2.7 jit (#7663) 2025-06-30 13:26:26 -07:00
Baizhou Zhang
7248272ccc Add dsv3 router gemm kernel (#7627) 2025-06-29 23:31:55 -07:00
Lianmin Zheng
22352d47a9 Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632)
Co-authored-by: Kan Wu <wukanustc@gmail.com>
2025-06-29 23:16:19 -07:00
Chunyuan WU
c5131f7a2f [CPU] add c++ kernel to bind CPU cores and memory node (#7524) 2025-06-29 19:45:25 -07:00
Lianmin Zheng
78700893ee [EAGLE] remove a wrong adjustment for page_size > 1 & topk > 1 in server_args.py (#7643) 2025-06-29 19:25:28 -07:00
Lianmin Zheng
663c04f76e Update CODEOWNERS (#7640) 2025-06-29 16:58:43 -07:00
Hubert Lu
3b3f1e3aeb [AMD] Add unit-test-sgl-kernel-amd to AMD CI (#7539) 2025-06-29 15:50:09 -07:00
JieXin Liang
b691dcc490 [misc] reduce weird rope_scaling_factor warning (#7176) 2025-06-29 15:42:45 -07:00
fzyzcjy
0c9c6c75a8 Move files related to EPLB (#7580) 2025-06-29 15:39:38 -07:00
Simo Lin
e3f9b54819 [bugfix] fix runtime dropping panic in editable (#7628) 2025-06-29 15:38:28 -07:00
finetune
b3cff3651e Fix sgl-router startup crash (#7619) 2025-06-29 14:41:34 -07:00
Xinyuan Tong
8f335b5bd6 Fix stream reasoning parser and Adds Kimi reasoning parser (#7432)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-06-29 14:39:05 -07:00
Lianmin Zheng
b2264076dc Add @mickqian as the CODEOWNERS of multimodal (#7636) 2025-06-29 09:27:33 -07:00
Ke Bao
04b35190e2 Add dsv3 fused a gemm to sgl-kernel (#7630) 2025-06-29 02:52:24 -07:00
Lianmin Zheng
071a1f51ae [Minor] clean up multimodal processor and tokenizer manager (#7624) 2025-06-29 02:50:14 -07:00
Simo Lin
7c0db3a6c5 [bugfix] Remove PR comment posting from Rust benchmark workflow (#7625) 2025-06-28 22:10:01 -07:00
Xinyuan Tong
c45e49d817 oai: Adds support for OpenAI chat completions API in bench_serving (#7036)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2025-06-28 22:59:20 +00:00
Yineng Zhang
d80539291b docs: add gb200 nvl72 and a16z grant (#7620) 2025-06-28 02:08:09 -07:00
fzyzcjy
00c7b1ad07 Let EP prefill support new DeepGEMM (#7310) 2025-06-28 01:45:30 -07:00
fzyzcjy
82eccae44e Let ep_scatter support arbitrary strides / ue8m0 format (#7309) 2025-06-28 01:38:33 -07:00
Yineng Zhang
a8c10aeeee fix unit tests (#7618) 2025-06-28 00:32:41 -07:00
Shangming Cai
eb429b88a4 [PD] Respect sampling_params.max_new_tokens when PD disaggregation is activated (#7598)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-06-27 22:22:01 -07:00
Lifu Huang
49538d111b Support dynamic LoRA loading / unloading in engine/server API (#7446) 2025-06-27 21:00:27 -07:00
Sheng Qi
cfe2edac38 [BUG] fix local_rank in initialize_dp_attention (#7584) 2025-06-27 20:01:01 -07:00
Lifu Huang
2373faa317 Fix flakiness in LoRA batch test. (#7552) 2025-06-27 19:51:43 -07:00
fzyzcjy
9efb2993da Tiny add logs for expert location updater (#7308) 2025-06-27 19:12:33 -07:00
Chunyuan WU
a5317b2fd3 [CPU] add optimizations for INT8 and FP8 DeepSeek (#6769)
Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com>
2025-06-27 19:04:29 -07:00
tarinkk
eb6c2c1663 Hybrid kv cache for LLaMA4 (#6563)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: tarinkk <rt572@physics.rutger.edu>
Co-authored-by: tarinkk <rt572@rutgers.physics.edu>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
2025-06-27 18:58:55 -07:00
Xinyuan Tong
357921aa51 Fix: Minicpm (#7612)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-06-27 17:32:29 -07:00
Simo Lin
c071198c1d [router] add centralized configuration module for sgl-router (#7588) 2025-06-27 15:42:02 -07:00
Lifu Huang
d7374d7467 Fix broken CI TestVILAServer (#7610) 2025-06-27 15:01:03 -07:00
Lianmin Zheng
ce3a3e8783 Move multimodal processors into a separate folder (#7581) 2025-06-27 11:58:24 -07:00
Qiaolin Yu
41650b0d70 feat: support compatibility between MTP and two-batch-overlap (#7225)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-06-27 01:10:27 -07:00
Xinyuan Tong
1b95162008 Updates transformers and timm dependencies (#7577)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-06-27 00:30:17 -07:00
Keyang Ru
29bd4c8135 [CI] Add CI Testing for Prefill-Decode Disaggregation with Router (#7540) 2025-06-27 00:18:56 -07:00
Ata Fatahi
031f64aa1b Add e2e test for multi instance multi stage memory release/resume occupuation (#7208)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2025-06-26 17:40:38 -07:00
fzyzcjy
3d7cdb2ebd Fix MTP error when enabling two-batch overlap (#7569) 2025-06-26 15:40:54 -07:00
Xinyuan Tong
604efe07e1 Updates Gemma3n MLP layer to adapt latest transformers version (#7573)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-06-26 15:07:22 -07:00
Cheng Wan
1b8cf77b01 [Fix] incorrect assert in EPLB (#7575) 2025-06-26 14:59:20 -07:00
Trevor Morris
bb9b608c86 [PD][NIXL] Set is_sorted=False to fix NIXL_ERR_NOT_FOUND (#7330) 2025-06-26 10:39:39 -07:00
Yineng Zhang
69183f8808 chore: bump v0.4.8.post1 (#7559) 2025-06-26 02:21:12 -07:00
Xinyuan Tong
9b00990bea chore: remove vlm unnecessary import (#7541)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2025-06-26 01:38:15 -07:00
Mick
4d67025a1d chore: improve ci bug reporting (#7542) 2025-06-26 01:32:44 -07:00
YanbingJiang
0e05fe8cf4 Update seed in CPU UTs to avoid flaky failure with single test (#7544) 2025-06-25 21:25:50 -07:00
Meng, Peng
2390a2bc8d Add Tencent HunYuanMoEV1 model support (#7549) 2025-06-25 20:59:53 -07:00
Ruihang Lai
16d76b9f23 [CMake] Fix sgl-kernel CMakeLists for Blackwell (#7543) 2025-06-25 19:00:46 -07:00
Shangming Cai
5c2142579a [PD] Raise error for incompatible mooncake version and some minor fixes (#7527)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-06-25 18:55:24 -07:00
Qiaolin Yu
b8df43ab9c Fix gathered_buffer issues in tbo (#7531) 2025-06-25 14:42:21 -07:00
Yuhong Guo
a1c1ebe935 Fix FP8 KV Cache Support in FA3 Backend (#7148) 2025-06-25 02:14:40 -07:00