Commit Graph

321 Commits

Author SHA1 Message Date
Xiaoyu Zhang
11965b0daf Fix sgl-kernel benchmark dead code (#11022) 2025-09-29 15:06:40 +08:00
Lianmin Zheng
35ec2a45a8 [minor] Remove deprecated function get_ip (#10883) 2025-09-25 16:18:04 -07:00
Liangsheng Yin
4a762041d7 move environ into sglang.srt to avoid break SRT auto sync. (#10791) 2025-09-23 02:04:20 -07:00
Lifu Huang
9241f4fd20 Move cached kernel to srt.utils (#10776) 2025-09-22 23:00:36 -07:00
ronnie_zheng
e22f3a5ec9 [Ascend]optimize Qwen3 on Ascend (#10574)
Co-authored-by: c30031083 <chenxu140@huawei.com>
2025-09-22 17:18:36 -07:00
Even Zhou
d27a6f7092 [Feature] Add MLAProcess for DeepSeek MLA on NPU (#10130) 2025-09-22 17:17:48 -07:00
Jimmy
56b991b12d [Feature]feat(get_ip): unify get_ip_xxx (#10081) 2025-09-18 22:35:26 -07:00
Lianmin Zheng
f949ad5794 [Auto Sync] Update activation.py, chunk_cache.py, utils.py (20250917) (#10538)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-16 17:06:43 -07:00
Lianmin Zheng
c49484a658 [Auto Sync] Update scheduler_profiler_mixin.py, rpd_utils.p... (20250916) (#10494)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: cctry <shiyang@x.ai>
2025-09-16 17:02:20 -07:00
b8zhong
b2435be682 Cache the result of is_blackwell platform check (#10498) 2025-09-15 22:30:28 -07:00
Liangsheng Yin
c3c26f76b3 [Env] minimal version for organizing envs (#10479) 2025-09-16 03:51:25 +08:00
Liangsheng Yin
305c9e8c2d [4/N]DP refactor: support watching mode get_load and shortest queue strategy (#10201) 2025-09-15 10:06:08 +08:00
amysaq2023
30d20ce84f Support loading weights from remote instance (#8215)
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
Co-authored-by: Chayenne <74843776+zhaochenyang20@users.noreply.github.com>
2025-09-12 17:40:22 +08:00
Zaili Wang
ef959d7b85 [CPU] fix OOM when mem-fraction is not set (#9090) 2025-09-10 23:52:22 -07:00
Cao E
7577f0e40f Add graph runner support with torch compile on CPU (#7843) 2025-09-07 21:33:58 -07:00
fzyzcjy
df97b31f37 Tiny support setting numa nodes for different ranks (#10006) 2025-09-05 19:01:27 +08:00
kk
e96973742c Optimized deepseek-v3/r1 model performance on mxfp4 run (#10008)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
2025-09-04 15:11:22 -07:00
Yineng Zhang
1b2ff4fb7f Revert "Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671)" (#9959) 2025-09-03 00:50:04 -07:00
kk
0dfd54d11d Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: wghuang <wghuang@amd.com>
2025-09-02 22:26:28 -07:00
Shangming Cai
a25e8e42eb Move multi-tokenizer event loop to better place (#9902)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-09-01 23:12:21 -07:00
ybyang
5f77e1292d Support Multi Process Tokenizer Manager(#6555) (#8964)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-09-01 01:00:13 -07:00
ybyang
fd18995cf3 Fix get_ip when no external network (#9700) 2025-08-27 10:28:52 -07:00
Lianmin Zheng
fd71b11b1d move is_sm90_supported/is_sm100_supported to python/sglang/srt/utils.py (#9679) 2025-08-27 03:34:29 -07:00
Stefan He
a530b3ffdc [RL] fix register the same ops multiple times (#9564) 2025-08-26 16:24:44 -07:00
miter
a0b22f2f17 remove redundant rank0_log function. (#9560)
Co-authored-by: linhuang <linhuang@ruijie.com.cn>
2025-08-24 23:17:55 -07:00
fzyzcjy
2600fc0d47 Overlapped weight offload (#8034) 2025-08-23 02:06:46 -07:00
Chanh Nguyen
127d4b0d5e Support GC Freezing to improve latency & throughput (#9241)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2025-08-23 13:43:09 +08:00
fzyzcjy
55d336cb08 Refactor weight offloading logic (#8521) 2025-08-21 03:48:13 -07:00
zifeitong
84b30d9e00 Set the default attention backend for GLM-4.5v to fa3 (#9245) 2025-08-17 16:34:19 -07:00
Lifu Huang
4b74c3fcca [chore] Clean up redundant lora_weight_names concept to simplify code (#9131) 2025-08-17 12:36:58 -07:00
Netanel Haber
845d12a979 model: support nvidia/Llama-3_3-Nemotron-Super-49B-v1 (#9067)
Co-authored-by: Kyle Huang <kylhuang@nvidia.com>
2025-08-17 01:48:15 -07:00
Cheng Wan
295895120d [6/N] MoE Refactor: Cleanup MoE-related configs (#8849) 2025-08-14 21:14:53 -07:00
Lifu Huang
5ded39cab2 Fix race condition in async lora unload (#9084) 2025-08-11 22:59:29 -07:00
Binyao Jiang
f29aba8c6e Support glm4.1v and glm4.5v (#8798)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Chang Su <csu272@usc.edu>
2025-08-09 00:59:13 -07:00
Lianmin Zheng
a947154286 Revert "Support Multi Process Tokenizer Manager" (#8960) 2025-08-08 02:28:27 -07:00
ybyang
7490e3f67d Support Multi Process Tokenizer Manager (#6555)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: lw9527 <952799980@qq.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
2025-08-08 01:45:50 -07:00
Cheng Wan
1d24db8348 Expert Parallelism for GPT-OSS (#8944) 2025-08-08 00:46:42 -07:00
Chang Su
92cc32d9fc Support v1/responses and use harmony in serving_chat (#8837)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-06 16:20:34 -07:00
Ying Sheng
168033d5fb Support mxfp4 for GPT-OSS (#8843)
Co-authored-by: Co-author fzyzcjy <ch271828n@outlook.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: zhuofan1123 <zhuofanl@nvidia.com>
Co-authored-by: liz-badada <jinyanc@nvidia.com>
Co-authored-by: xutizhou <xutingz@nvidia.com>
Co-authored-by: linhu-nv <linhu@nvidia.com>
2025-08-06 00:05:25 -07:00
Yuhao Yao
873f384a51 [feat] Add detail in image_data (#8596) 2025-08-05 14:01:38 +08:00
kk
d4bf5a8524 Support OCP MXFP4 quantization on AMD GPUs (#8255)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-08-04 18:14:52 -07:00
ybyang
6f9baf1002 [Improvements] Merge health check route (#8444)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
2025-08-03 01:59:06 -07:00
Cheng Wan
6c88f6c8d9 [5/N] MoE Refactor: Update MoE parallelism arguments (#8658) 2025-08-01 01:20:03 -07:00
Ke Bao
8fbcfd0723 Update step3v default config (#8626) 2025-08-01 00:49:26 +08:00
Yuxuan Zhang
6d6a8bc278 GLM-4.5 Model Support (#8224)
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-07-27 22:54:07 -07:00
Lifu Huang
df90645525 Support overlapped lora updates (#8213) 2025-07-27 13:00:44 -07:00
Yingchun Lai
36d6f0ba5b fix: fix the missing metrics on non-rank0 nodes (#7720) 2025-07-27 00:55:25 -07:00
Lianmin Zheng
ed2e313eb6 Clean up server_args, triton cache manager (#8332) 2025-07-25 14:14:51 -07:00
Stepan Kargaltsev
1b9cea5ade [P/D] Support ipv6 in P/D scenario (#7858)
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-07-25 08:53:30 -07:00
Ying Wang
7ad6b766c5 fix: Fix failed functional tests https://github.com/meta-llama/llama-stack-evals (#8266) 2025-07-24 23:11:32 -07:00