fzyzcjy
|
df97b31f37
|
Tiny support setting numa nodes for different ranks (#10006)
|
2025-09-05 19:01:27 +08:00 |
|
kk
|
e96973742c
|
Optimized deepseek-v3/r1 model performance on mxfp4 run (#10008)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
|
2025-09-04 15:11:22 -07:00 |
|
Yineng Zhang
|
1b2ff4fb7f
|
Revert "Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671)" (#9959)
|
2025-09-03 00:50:04 -07:00 |
|
kk
|
0dfd54d11d
|
Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: wghuang <wghuang@amd.com>
|
2025-09-02 22:26:28 -07:00 |
|
Shangming Cai
|
a25e8e42eb
|
Move multi-tokenizer event loop to better place (#9902)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-01 23:12:21 -07:00 |
|
ybyang
|
5f77e1292d
|
Support Multi Process Tokenizer Manager(#6555) (#8964)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-01 01:00:13 -07:00 |
|
ybyang
|
fd18995cf3
|
Fix get_ip when no external network (#9700)
|
2025-08-27 10:28:52 -07:00 |
|
Lianmin Zheng
|
fd71b11b1d
|
move is_sm90_supported/is_sm100_supported to python/sglang/srt/utils.py (#9679)
|
2025-08-27 03:34:29 -07:00 |
|
Stefan He
|
a530b3ffdc
|
[RL] fix register the same ops multiple times (#9564)
|
2025-08-26 16:24:44 -07:00 |
|
miter
|
a0b22f2f17
|
remove redundant rank0_log function. (#9560)
Co-authored-by: linhuang <linhuang@ruijie.com.cn>
|
2025-08-24 23:17:55 -07:00 |
|
fzyzcjy
|
2600fc0d47
|
Overlapped weight offload (#8034)
|
2025-08-23 02:06:46 -07:00 |
|
Chanh Nguyen
|
127d4b0d5e
|
Support GC Freezing to improve latency & throughput (#9241)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2025-08-23 13:43:09 +08:00 |
|
fzyzcjy
|
55d336cb08
|
Refactor weight offloading logic (#8521)
|
2025-08-21 03:48:13 -07:00 |
|
zifeitong
|
84b30d9e00
|
Set the default attention backend for GLM-4.5v to fa3 (#9245)
|
2025-08-17 16:34:19 -07:00 |
|
Lifu Huang
|
4b74c3fcca
|
[chore] Clean up redundant lora_weight_names concept to simplify code (#9131)
|
2025-08-17 12:36:58 -07:00 |
|
Netanel Haber
|
845d12a979
|
model: support nvidia/Llama-3_3-Nemotron-Super-49B-v1 (#9067)
Co-authored-by: Kyle Huang <kylhuang@nvidia.com>
|
2025-08-17 01:48:15 -07:00 |
|
Cheng Wan
|
295895120d
|
[6/N] MoE Refactor: Cleanup MoE-related configs (#8849)
|
2025-08-14 21:14:53 -07:00 |
|
Lifu Huang
|
5ded39cab2
|
Fix race condition in async lora unload (#9084)
|
2025-08-11 22:59:29 -07:00 |
|
Binyao Jiang
|
f29aba8c6e
|
Support glm4.1v and glm4.5v (#8798)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Chang Su <csu272@usc.edu>
|
2025-08-09 00:59:13 -07:00 |
|
Lianmin Zheng
|
a947154286
|
Revert "Support Multi Process Tokenizer Manager" (#8960)
|
2025-08-08 02:28:27 -07:00 |
|
ybyang
|
7490e3f67d
|
Support Multi Process Tokenizer Manager (#6555)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: lw9527 <952799980@qq.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
|
2025-08-08 01:45:50 -07:00 |
|
Cheng Wan
|
1d24db8348
|
Expert Parallelism for GPT-OSS (#8944)
|
2025-08-08 00:46:42 -07:00 |
|
Chang Su
|
92cc32d9fc
|
Support v1/responses and use harmony in serving_chat (#8837)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-06 16:20:34 -07:00 |
|
Ying Sheng
|
168033d5fb
|
Support mxfp4 for GPT-OSS (#8843)
Co-authored-by: Co-author fzyzcjy <ch271828n@outlook.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: zhuofan1123 <zhuofanl@nvidia.com>
Co-authored-by: liz-badada <jinyanc@nvidia.com>
Co-authored-by: xutizhou <xutingz@nvidia.com>
Co-authored-by: linhu-nv <linhu@nvidia.com>
|
2025-08-06 00:05:25 -07:00 |
|
Yuhao Yao
|
873f384a51
|
[feat] Add detail in image_data (#8596)
|
2025-08-05 14:01:38 +08:00 |
|
kk
|
d4bf5a8524
|
Support OCP MXFP4 quantization on AMD GPUs (#8255)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
|
2025-08-04 18:14:52 -07:00 |
|
ybyang
|
6f9baf1002
|
[Improvements] Merge health check route (#8444)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
|
2025-08-03 01:59:06 -07:00 |
|
Cheng Wan
|
6c88f6c8d9
|
[5/N] MoE Refactor: Update MoE parallelism arguments (#8658)
|
2025-08-01 01:20:03 -07:00 |
|
Ke Bao
|
8fbcfd0723
|
Update step3v default config (#8626)
|
2025-08-01 00:49:26 +08:00 |
|
Yuxuan Zhang
|
6d6a8bc278
|
GLM-4.5 Model Support (#8224)
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-07-27 22:54:07 -07:00 |
|
Lifu Huang
|
df90645525
|
Support overlapped lora updates (#8213)
|
2025-07-27 13:00:44 -07:00 |
|
Yingchun Lai
|
36d6f0ba5b
|
fix: fix the missing metrics on non-rank0 nodes (#7720)
|
2025-07-27 00:55:25 -07:00 |
|
Lianmin Zheng
|
ed2e313eb6
|
Clean up server_args, triton cache manager (#8332)
|
2025-07-25 14:14:51 -07:00 |
|
Stepan Kargaltsev
|
1b9cea5ade
|
[P/D] Support ipv6 in P/D scenario (#7858)
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-07-25 08:53:30 -07:00 |
|
Ying Wang
|
7ad6b766c5
|
fix: Fix failed functional tests https://github.com/meta-llama/llama-stack-evals (#8266)
|
2025-07-24 23:11:32 -07:00 |
|
Lianmin Zheng
|
55381a46ac
|
Revert "[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability" (#8181)
|
2025-07-19 22:41:30 -07:00 |
|
ybyang
|
4540a4666a
|
[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability (#8115)
Signed-off-by: ybyang <ybyang7@iflytek.com>
|
2025-07-19 18:10:00 -07:00 |
|
Lifu Huang
|
4e3defe5a7
|
Support start up LoRA server without initial adapters (#8019)
|
2025-07-19 15:38:09 -07:00 |
|
Garry Fang
|
60468da4e2
|
bugfix: fix sglang crash in NVIDIA MIG container (#8167)
Signed-off-by: Garrybest <garrybest@foxmail.com>
|
2025-07-19 14:41:27 -07:00 |
|
Binyao Jiang
|
b7e951a6db
|
Feat: Support audio in Phi4-mm model (#8048)
|
2025-07-18 21:03:53 -07:00 |
|
Sai Enduri
|
d0510f08fe
|
Revert "Fix different device type adjustment in PP" (#8141)
|
2025-07-18 01:12:11 -07:00 |
|
Mick
|
497efe747d
|
Revert "feat: replace Decord with video_reader-rs" (#8077)
|
2025-07-15 20:04:56 -07:00 |
|
Qiaolin Yu
|
3bc43c683e
|
Fix different device type adjustment in PP (#7760)
|
2025-07-15 19:37:14 -07:00 |
|
kozo
|
ebff5fcb06
|
feat: replace Decord with video_reader-rs (#5163)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-07-15 18:17:34 -07:00 |
|
ronnie_zheng
|
766392c6bd
|
[feature]Ascend quantization support (#7791)
Co-authored-by: ichernob <ichernobnn@gmail.com>
Co-authored-by: liupeng <liupeng374@huawei.com>
|
2025-07-10 09:17:37 -07:00 |
|
Mick
|
b5e3d6031c
|
vlm: support video as an input modality (#5888)
|
2025-07-09 23:48:35 -07:00 |
|
Brayden Zhong
|
a37e1247c1
|
[Multimodal][Perf] Use pybase64 instead of base64 (#7724)
|
2025-07-08 14:00:58 -07:00 |
|
kk
|
653b873b91
|
Fix cache modules of triton import error (#7832)
|
2025-07-08 02:50:09 -07:00 |
|
Cheng Wan
|
8fc910db03
|
DP Attention with Auto DeepEP Dispatch (#7222)
|
2025-07-05 01:54:24 -07:00 |
|
Lianmin Zheng
|
14229ccf8f
|
Move mem_fraction_static adjustment for multimodal models to server_args.py & Fix session control & Other cleanups (#7748)
|
2025-07-04 16:33:33 -07:00 |
|