Commit Graph

4725 Commits

Author SHA1 Message Date
JieXin Liang
6cdcbcc674 [fix] fix enable_pdl for blackwell (#9011) 2025-08-19 01:16:08 +08:00
Lianmin Zheng
c480a3f6ea Minor style fixes for sgl-kernel (#9289) 2025-08-18 09:38:35 -07:00
Simo Lin
6e316588f8 [router] add reasoning parser base structure (#9310)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-08-18 09:26:09 -07:00
Simo Lin
24247b4168 [router] add tokenizer metrics (#9307)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-08-18 09:25:51 -07:00
fzyzcjy
4c0bb411e5 Further fix memory pool leak error (#9298) 2025-08-18 00:58:06 -07:00
Yuan Luo
968e181826 Fix triton_fused_moe unit test and benchmark (#9276)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-08-18 00:54:33 -07:00
Simo Lin
d08663eec1 [router] tokenizer factory, hf tokenizer, and stop sequence detector (#9293)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-08-17 22:38:38 -07:00
b8zhong
716e682721 [Fix] Add undefined update_tensor_inplace function (#6307) 2025-08-18 11:11:00 +08:00
zifeitong
84b30d9e00 Set the default attention backend for GLM-4.5v to fa3 (#9245) 2025-08-17 16:34:19 -07:00
Simo Lin
ff0cf51c8e [router] introducing tokenizer trait (#9287) 2025-08-17 16:30:01 -07:00
Yineng Zhang
a1c7f742f9 chore: bump sgl-kernel v0.3.6.post1 (#9286) 2025-08-17 16:26:17 -07:00
blzheng
ebbb75e917 [CPU] Fix TP padding issue on Phi-4 (#8289) 2025-08-17 16:25:26 -07:00
Simo Lin
b341b7dbce [router] introduce prefill response draining for http compliance (#9281) 2025-08-17 14:23:04 -07:00
fzyzcjy
b498cd21d7 Tiny make fp4 moe method parameters more static (#8520) 2025-08-17 13:26:02 -07:00
kousakawang
0fc54b971e [fix]: fix cutlass moe ut and and Opt H20 cutlass groupGemm performance (#9272)
Co-authored-by: wanghanpei <wanghanpei@bytedance.com>
2025-08-17 13:09:49 -07:00
fzyzcjy
b3c1f2e4f2 Fix memory pool leak error (#9271) 2025-08-17 12:53:34 -07:00
Ke Bao
be1a3cd9b4 Fix swa eagle verify accuracy for Triton backend (#9279) 2025-08-17 12:52:02 -07:00
Lifu Huang
4b74c3fcca [chore] Clean up redundant lora_weight_names concept to simplify code (#9131) 2025-08-17 12:36:58 -07:00
Jeff Nettleton
ce3ca9b02f [router] add cargo clippy in CI and fix-up linting errors (#9242) 2025-08-17 11:03:56 -07:00
Liangsheng Yin
4d98e48649 Revert "[Misc] feat: Deepgemm update for sgl-kernel (#8790)" to fix kernel CI (#9260) 2025-08-17 22:59:50 +08:00
Netanel Haber
3d77a31885 from python.sglang.srt -> from sglang.srt (#9268) 2025-08-17 02:45:45 -07:00
Netanel Haber
845d12a979 model: support nvidia/Llama-3_3-Nemotron-Super-49B-v1 (#9067)
Co-authored-by: Kyle Huang <kylhuang@nvidia.com>
2025-08-17 01:48:15 -07:00
Stefan He
e47800e176 Quick Fix GLM (#9264) 2025-08-16 23:43:41 -07:00
Simo Lin
bb10e3a1c3 [router] fix pd prefill http request complinace issue (#9237) 2025-08-16 22:36:45 -07:00
Even Zhou
fda762a27d [Bugfix] Change vLLM install order & Add A2 support (#9232) 2025-08-16 22:36:14 -07:00
Mick
1df84ff414 ci: simplify multi-modality tests by using mixins (#9006) 2025-08-16 22:25:02 -07:00
Binyao Jiang
66d6be0874 Bug fix: use correct mm_items in embed_mm_inputs (#8893) 2025-08-16 19:55:56 -07:00
kk
1c1f8a118e Combine fp4.py and mxfp4.py into one file and support dynamic mxfp4 quantization in mxfp4.py (#9049)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-08-16 19:01:54 -07:00
Shangming Cai
384f8ab5ce [PD] Support PD disaggregation with Prefill PP (#8846)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: root <huzhiyuan@xiaohongshu.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Francis <38564764+ssssnow@users.noreply.github.com>
Co-authored-by: zitto <zhjc1124@gmail.com>
2025-08-16 18:31:31 -07:00
zyksir
6a9d6ca33c fix unexcepted answer in EAGLE mode (#9252) 2025-08-16 17:45:36 -07:00
VDV1985
94371dbbd6 [feature] Ascend NPU graph support (#8027)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
2025-08-16 17:25:17 -07:00
Sai Enduri
740f063035 Fix Custom All Reduce CI job. (#9258) 2025-08-16 16:29:43 -07:00
Hank Han
81da16f6d3 [CI] add deepseek w4a8 test on h20 ci (#7758) 2025-08-16 01:54:13 -07:00
Brayden Zhong
bc938ea13f Fix DP load for embedding (#9165) 2025-08-15 23:58:44 -07:00
Trevor Morris
eff4eb3fdd Add fp4 quantize before all-gather for Flashinfer cutlass MoE DP (max throughput) (#7667) 2025-08-15 22:08:11 -07:00
Yineng Zhang
87dab54824 Revert "chore: bump sgl-kernel v0.3.6 (#9220)" (#9247) 2025-08-15 17:24:36 -07:00
Yineng Zhang
5121af4627 Revert "chore(docker): update sgl_kernel version to 0.3.6 in Dockerfi… (#9246) 2025-08-15 17:19:38 -07:00
kk
983aa4967b Fix nan value generated after custom all reduce (#8663)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-08-15 12:33:54 -07:00
Hubert Lu
9c3e95d98b [AMD] Expand test coverage for AMD CI and enable apply_token_bitmask_inplace_cuda in sgl-kernel (#8268) 2025-08-15 12:32:51 -07:00
ishandhanani
e52c3866eb chore(docker): update sgl_kernel version to 0.3.6 in Dockerfile.gb200 (#9243) 2025-08-15 12:06:52 -07:00
Simo Lin
da53e13cbb [router] preserve original worker response header in router (#9236) 2025-08-15 11:01:47 -07:00
Jeff Nettleton
d7e38b2f6d [router] clean up lint warnings with clippy execution (#9201) 2025-08-15 11:01:21 -07:00
Simo Lin
21b8846066 [router] allow more health check configuration (#9198) 2025-08-15 08:07:45 -07:00
Liangsheng Yin
0c8594e67d Optional extension for green context (#9231) 2025-08-15 21:33:52 +08:00
Yineng Zhang
c186feed7f chore: bump sgl-kernel v0.3.6 (#9220) 2025-08-15 02:50:50 -07:00
Cheng Wan
84b006b278 Cleanup MoE Refactor (#9223) 2025-08-15 02:28:33 -07:00
Shangming Cai
8ca07bd948 [CI] Fix sgl-router disaggregation test (#9222)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-08-15 02:24:44 -07:00
jy-song-hub
4fc09e0df0 Fp4 MOE quant kernel optimization (#8777)
Co-authored-by: Rain Jiang <96632942+rainj-me@users.noreply.github.com>
2025-08-15 01:46:16 -07:00
PGFLMG
a3d99d6dcd [Misc] feat: Deepgemm update for sgl-kernel (#8790) 2025-08-15 01:05:27 -07:00
Xuchun Shang
189af90896 [Eagle Warning fix] replace the deprecated 'and' with & (#9215)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
2025-08-15 15:43:36 +08:00