sglang

Author	SHA1	Message	Date
Liangsheng Yin	516738b096	Depreate `global_server_args_dict` (#11528 )	2025-10-13 19:34:43 +08:00
Cheng Wan	1bdd010291	Revert "Deprecate `global_server_args_dict`" (#11520 )	2025-10-12 17:40:40 -07:00
Liangsheng Yin	1083e7e3df	Deprecate `global_server_args_dict` (#11331 )	2025-10-13 01:20:47 +08:00
kk	8ebf72fef3	[Fix] RuntimeError: get_cfg Unsupported input_type:Float4_e2m1fn_x2 in using aiter-mxfp4-moe (#10981 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2025-09-26 22:12:22 -07:00
yhyang201	388c05d544	Fix bias handling in TritonMoeQuantInfo within quantization/mxfp4.py (#10579 )	2025-09-18 11:44:43 -07:00
Cheng Wan	3fa62da78c	[7/N] MoE Refactor: the implementation of new framework (#9269 )	2025-09-05 21:09:09 -07:00
Hubert Lu	2c562fd2d0	Fix Llama 4 with MXFP4 dynamic quant on MI35x (#9993 )	2025-09-04 00:48:58 -07:00
Lianmin Zheng	4aeba40d7b	[Sync] Update mxfp4.py (20250827) (#9724 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Shiyang Chen <shiyang@x.ai>	2025-08-27 17:00:09 -07:00
Lianmin Zheng	fd71b11b1d	move is_sm90_supported/is_sm100_supported to python/sglang/srt/utils.py (#9679 )	2025-08-27 03:34:29 -07:00
Stefan He	a530b3ffdc	[RL] fix register the same ops multiple times (#9564 )	2025-08-26 16:24:44 -07:00
hlu1	ccd3fb946e	[fix] Fix mxfp4 triton MoE tp bug (#9473 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-23 01:48:40 -07:00
fzyzcjy	0374304a2c	Add enable_flashinfer_mxfp4_bf16_moe for higher precision and slower moe backend (#9004 )	2025-08-23 15:38:40 +08:00
kk	1c1f8a118e	Combine fp4.py and mxfp4.py into one file and support dynamic mxfp4 quantization in mxfp4.py (#9049 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2025-08-16 19:01:54 -07:00
Cheng Wan	84b006b278	Cleanup MoE Refactor (#9223 )	2025-08-15 02:28:33 -07:00
Cheng Wan	295895120d	[6/N] MoE Refactor: Cleanup MoE-related configs (#8849 )	2025-08-14 21:14:53 -07:00
Xiaoyu Zhang	63d82a776a	refine mxfp4 shuffling log (#9194 )	2025-08-14 10:57:29 -07:00
fzyzcjy	5190ba7f42	Fuse two kernels of hidden states padding into quantization kernel (#9005 ) Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-08-12 01:20:13 -07:00
Cheng Wan	1d24db8348	Expert Parallelism for GPT-OSS (#8944 )	2025-08-08 00:46:42 -07:00
Xiaoyu Zhang	0d1e27a0c5	Better optimization log for gpt-oss model (#8953 )	2025-08-08 00:11:48 -07:00
Xiaoyu Zhang	3ae33fcd0a	Fix hopper launch gpt-oss model illegal memory (#8908 )	2025-08-07 10:02:40 -07:00
Xiaoyu Zhang	47824c1488	[Perf] Auto enable best flashinfer mxfp4 kernel in b200 (#8898 )	2025-08-07 01:08:41 -07:00
Xiaoyu Zhang	4373df5525	add flashinfer mxfp4 (#8847 )	2025-08-06 16:23:41 -07:00
Ying Sheng	168033d5fb	Support mxfp4 for GPT-OSS (#8843 ) Co-authored-by: Co-author fzyzcjy <ch271828n@outlook.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: zhuofan1123 <zhuofanl@nvidia.com> Co-authored-by: liz-badada <jinyanc@nvidia.com> Co-authored-by: xutizhou <xutingz@nvidia.com> Co-authored-by: linhu-nv <linhu@nvidia.com>	2025-08-06 00:05:25 -07:00

23 Commits