sglang

Author	SHA1	Message	Date
fzyzcjy	df97b31f37	Tiny support setting numa nodes for different ranks (#10006 )	2025-09-05 19:01:27 +08:00
kk	e96973742c	Optimized deepseek-v3/r1 model performance on mxfp4 run (#10008 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>	2025-09-04 15:11:22 -07:00
Yineng Zhang	1b2ff4fb7f	Revert "Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671 )" (#9959 )	2025-09-03 00:50:04 -07:00
kk	0dfd54d11d	Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: wghuang <wghuang@amd.com>	2025-09-02 22:26:28 -07:00
Shangming Cai	a25e8e42eb	Move multi-tokenizer event loop to better place (#9902 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-09-01 23:12:21 -07:00
ybyang	5f77e1292d	Support Multi Process Tokenizer Manager(#6555 ) (#8964 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Signed-off-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com> Co-authored-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-09-01 01:00:13 -07:00
ybyang	fd18995cf3	Fix get_ip when no external network (#9700 )	2025-08-27 10:28:52 -07:00
Lianmin Zheng	fd71b11b1d	move is_sm90_supported/is_sm100_supported to python/sglang/srt/utils.py (#9679 )	2025-08-27 03:34:29 -07:00
Stefan He	a530b3ffdc	[RL] fix register the same ops multiple times (#9564 )	2025-08-26 16:24:44 -07:00
miter	a0b22f2f17	remove redundant rank0_log function. (#9560 ) Co-authored-by: linhuang <linhuang@ruijie.com.cn>	2025-08-24 23:17:55 -07:00
fzyzcjy	2600fc0d47	Overlapped weight offload (#8034 )	2025-08-23 02:06:46 -07:00
Chanh Nguyen	127d4b0d5e	Support GC Freezing to improve latency & throughput (#9241 ) Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2025-08-23 13:43:09 +08:00
fzyzcjy	55d336cb08	Refactor weight offloading logic (#8521 )	2025-08-21 03:48:13 -07:00
zifeitong	84b30d9e00	Set the default attention backend for GLM-4.5v to fa3 (#9245 )	2025-08-17 16:34:19 -07:00
Lifu Huang	4b74c3fcca	[chore] Clean up redundant lora_weight_names concept to simplify code (#9131 )	2025-08-17 12:36:58 -07:00
Netanel Haber	845d12a979	model: support nvidia/Llama-3_3-Nemotron-Super-49B-v1 (#9067 ) Co-authored-by: Kyle Huang <kylhuang@nvidia.com>	2025-08-17 01:48:15 -07:00
Cheng Wan	295895120d	[6/N] MoE Refactor: Cleanup MoE-related configs (#8849 )	2025-08-14 21:14:53 -07:00
Lifu Huang	5ded39cab2	Fix race condition in async lora unload (#9084 )	2025-08-11 22:59:29 -07:00
Binyao Jiang	f29aba8c6e	Support glm4.1v and glm4.5v (#8798 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Chang Su <csu272@usc.edu>	2025-08-09 00:59:13 -07:00
Lianmin Zheng	a947154286	Revert "Support Multi Process Tokenizer Manager" (#8960 )	2025-08-08 02:28:27 -07:00
ybyang	7490e3f67d	Support Multi Process Tokenizer Manager (#6555 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Signed-off-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: lw9527 <952799980@qq.com> Co-authored-by: huanglong <huanglong@linux.alibaba.com> Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>	2025-08-08 01:45:50 -07:00
Cheng Wan	1d24db8348	Expert Parallelism for GPT-OSS (#8944 )	2025-08-08 00:46:42 -07:00
Chang Su	92cc32d9fc	Support v1/responses and use harmony in serving_chat (#8837 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-06 16:20:34 -07:00
Ying Sheng	168033d5fb	Support mxfp4 for GPT-OSS (#8843 ) Co-authored-by: Co-author fzyzcjy <ch271828n@outlook.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: zhuofan1123 <zhuofanl@nvidia.com> Co-authored-by: liz-badada <jinyanc@nvidia.com> Co-authored-by: xutizhou <xutingz@nvidia.com> Co-authored-by: linhu-nv <linhu@nvidia.com>	2025-08-06 00:05:25 -07:00
Yuhao Yao	873f384a51	[feat] Add detail in image_data (#8596 )	2025-08-05 14:01:38 +08:00
kk	d4bf5a8524	Support OCP MXFP4 quantization on AMD GPUs (#8255 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>	2025-08-04 18:14:52 -07:00
ybyang	6f9baf1002	[Improvements] Merge health check route (#8444 ) Signed-off-by: ybyang <ybyang7@iflytek.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-08-03 01:59:06 -07:00
Cheng Wan	6c88f6c8d9	[5/N] MoE Refactor: Update MoE parallelism arguments (#8658 )	2025-08-01 01:20:03 -07:00
Ke Bao	8fbcfd0723	Update step3v default config (#8626 )	2025-08-01 00:49:26 +08:00
Yuxuan Zhang	6d6a8bc278	GLM-4.5 Model Support (#8224 ) Co-authored-by: Lifu Huang <lifu.hlf@gmail.com> Co-authored-by: Binyao Jiang <byjiang1996@gmail.com> Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-07-27 22:54:07 -07:00
Lifu Huang	df90645525	Support overlapped lora updates (#8213 )	2025-07-27 13:00:44 -07:00
Yingchun Lai	36d6f0ba5b	fix: fix the missing metrics on non-rank0 nodes (#7720 )	2025-07-27 00:55:25 -07:00
Lianmin Zheng	ed2e313eb6	Clean up server_args, triton cache manager (#8332 )	2025-07-25 14:14:51 -07:00
Stepan Kargaltsev	1b9cea5ade	[P/D] Support ipv6 in P/D scenario (#7858 ) Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-07-25 08:53:30 -07:00
Ying Wang	7ad6b766c5	fix: Fix failed functional tests https://github.com/meta-llama/llama-stack-evals (#8266 )	2025-07-24 23:11:32 -07:00
Lianmin Zheng	55381a46ac	Revert "[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability" (#8181 )	2025-07-19 22:41:30 -07:00
ybyang	4540a4666a	[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability (#8115 ) Signed-off-by: ybyang <ybyang7@iflytek.com>	2025-07-19 18:10:00 -07:00
Lifu Huang	4e3defe5a7	Support start up LoRA server without initial adapters (#8019 )	2025-07-19 15:38:09 -07:00
Garry Fang	60468da4e2	bugfix: fix sglang crash in NVIDIA MIG container (#8167 ) Signed-off-by: Garrybest <garrybest@foxmail.com>	2025-07-19 14:41:27 -07:00
Binyao Jiang	b7e951a6db	Feat: Support audio in Phi4-mm model (#8048 )	2025-07-18 21:03:53 -07:00
Sai Enduri	d0510f08fe	Revert "Fix different device type adjustment in PP" (#8141 )	2025-07-18 01:12:11 -07:00
Mick	497efe747d	Revert "feat: replace Decord with video_reader-rs" (#8077 )	2025-07-15 20:04:56 -07:00
Qiaolin Yu	3bc43c683e	Fix different device type adjustment in PP (#7760 )	2025-07-15 19:37:14 -07:00
kozo	ebff5fcb06	feat: replace Decord with video_reader-rs (#5163 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-15 18:17:34 -07:00
ronnie_zheng	766392c6bd	[feature]Ascend quantization support (#7791 ) Co-authored-by: ichernob <ichernobnn@gmail.com> Co-authored-by: liupeng <liupeng374@huawei.com>	2025-07-10 09:17:37 -07:00
Mick	b5e3d6031c	vlm: support video as an input modality (#5888 )	2025-07-09 23:48:35 -07:00
Brayden Zhong	a37e1247c1	[Multimodal][Perf] Use `pybase64` instead of `base64` (#7724 )	2025-07-08 14:00:58 -07:00
kk	653b873b91	Fix cache modules of triton import error (#7832 )	2025-07-08 02:50:09 -07:00
Cheng Wan	8fc910db03	DP Attention with Auto DeepEP Dispatch (#7222 )	2025-07-05 01:54:24 -07:00
Lianmin Zheng	14229ccf8f	Move mem_fraction_static adjustment for multimodal models to `server_args.py` & Fix session control & Other cleanups (#7748 )	2025-07-04 16:33:33 -07:00

1 2 3 4 5 ...

306 Commits