sglang

Author	SHA1	Message	Date
Cheng Wan	5b214b50b6	[Refactor] move `deep_gemm_wrapper` out of `quantization` (#11784 )	2025-10-17 18:57:54 -07:00
fzyzcjy	33e9bbec35	Make single-batch overlap compatible with offloading (#11614 )	2025-10-18 08:45:54 +08:00
fzyzcjy	dcb8f090ad	Super tiny fix CI (#11788 )	2025-10-17 17:41:58 -07:00
fzyzcjy	8af8491298	Support casting bf16 NextN moe to fp8 (#11613 )	2025-10-18 08:02:15 +08:00
fzyzcjy	505329cab0	Support shared experts overlap in cutlass moe (#11611 )	2025-10-18 07:59:40 +08:00
Chang Su	627974405d	[Lint] Add `python/sglang` to ruff F401 checks and remove unused imports in files (#11685 )	2025-10-17 16:49:46 -07:00
Even Zhou	3cceaa381a	[Bugfix] Fix Qwen3/DSV3/DSV3.2 model support (#11510 )	2025-10-16 15:14:09 +08:00
Xun Sun	a40229f6f8	[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423 ) Co-authored-by: Hank Han <hanhan7630@outlook.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-14 19:40:54 -07:00
Liangsheng Yin	516738b096	Depreate `global_server_args_dict` (#11528 )	2025-10-13 19:34:43 +08:00
Cheng Wan	1bdd010291	Revert "Deprecate `global_server_args_dict`" (#11520 )	2025-10-12 17:40:40 -07:00
Liangsheng Yin	1083e7e3df	Deprecate `global_server_args_dict` (#11331 )	2025-10-13 01:20:47 +08:00
Liu-congo	c80a96dae9	[BugFix] test_mla_fp8.py fails on Cublas 12.9 (#11360 ) Signed-off-by: Liu-congo <1502632128@qq.com>	2025-10-10 21:14:24 -07:00
fzyzcjy	efbc687c28	Support DeepSeek V3.2 Exp (#11061 ) Co-authored-by: Stefan He <11166516+hebiao064@users.noreply.github.com> Co-authored-by: Liangsheng Yin <95566987+hnyls2002@users.noreply.github.com> Co-authored-by: Baizhou Zhang <56809903+fridge003@users.noreply.github.com> Co-authored-by: DarkSharpness <76582120+darksharpness@users.noreply.github.com> Co-authored-by: ZhengdQin <46387172+zhengdqin@users.noreply.github.com> Co-authored-by: DarkSharpness <2040703891@qq.com> Co-authored-by: hnyls2002 <lsyincs@gmail.com> Co-authored-by: Zhengda Qin <zhengdqin@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-06 00:24:15 -07:00
fzyzcjy	b65db0287b	Tiny cleanup deepseek_v2.py (#11163 )	2025-10-02 21:54:52 +08:00
fzyzcjy	5e786cca3a	Support single batch overlap (#10422 )	2025-10-02 18:04:36 +08:00
fzyzcjy	0b9dfba787	Support dispatch low latency (#10263 ) Co-authored-by: Kaixi Hou <4001424+kaixih@users.noreply.github.com>	2025-10-02 18:02:19 +08:00
fzyzcjy	f35def8652	Fuse quantize and rope in trtllm_mla MTP (#10779 )	2025-10-02 17:59:37 +08:00
fzyzcjy	44b1fbe258	Fix DeepSeek chunked prefill memory issue (#11149 )	2025-10-01 23:56:59 -07:00
Even Zhou	d27a6f7092	[Feature] Add MLAProcess for DeepSeek MLA on NPU (#10130 )	2025-09-22 17:17:48 -07:00
Yineng Zhang	f67d1f45bc	[Auto Sync] Update deepseek_v2.py (20250922) (#10717 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Michael Granado <mgranado@together.ai>	2025-09-21 17:43:50 -07:00
Yineng Zhang	7c876de7f5	fix: remove awq_dequantize deps (#10686 )	2025-09-20 01:47:01 -07:00
Yineng Zhang	b17e67df36	[Auto Sync] Update deepseek_v2.py (20250920) (#10683 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-09-19 23:43:31 -07:00
Shu Wang	124097fc5b	enable prefix cache with dp (#10459 )	2025-09-16 18:26:58 -07:00
cicirori	a2f7218a2e	support using fa4 on deepseek on blackwell (#9928 )	2025-09-16 16:16:06 -07:00
fzyzcjy	059c13de5c	Fix trtllm_moe wrong correction bias (#10440 )	2025-09-15 01:02:05 -07:00
Cheng Wan	4844fac91d	Refactor TopK to ensure readability and extensibility (#9338 )	2025-09-14 19:16:25 -07:00
fzyzcjy	258d02c86d	Fix correction bias undefined behavior for nvfp4 models (#10426 )	2025-09-14 18:41:09 -07:00
fzyzcjy	fa46e2bd40	Support offloading in fp8 (#9948 )	2025-09-14 01:14:28 -07:00
fzyzcjy	b047b553c2	[2/2] Speed up prefill mla attention concat (#10157 )	2025-09-14 01:12:04 -07:00
Shu Wang	36acd2ff16	Fix chunked prefix cache for nvfp4 (#10180 ) Co-authored-by: Elfie Guo <elfieg@nvidia.com>	2025-09-12 03:20:30 -07:00
Shu Wang	3df05f4d6a	[NVIDIA] [3/N] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#9199 )	2025-09-11 20:18:43 -07:00
Hubert Lu	91b3555d2d	Add tests to AMD CI for MI35x (#9662 ) Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>	2025-09-10 12:50:05 -07:00
Lianmin Zheng	4582931ac3	Revert "Revert the changes on NCCL symmetric memory" (#10238 )	2025-09-09 12:11:49 -07:00
Lianmin Zheng	d352c29aa0	Revert the changes on NCCL symmetric memory (#10210 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-09-09 11:01:33 -07:00
Baizhou Zhang	8ad700f735	Cleaning codes for speculative attention mode (#10149 )	2025-09-08 17:38:06 -07:00
cicirori	8c5930f08a	Add speculator attention backend switch (#9981 )	2025-09-07 21:44:36 -07:00
kk	400d3b97ae	Fix run time error in dsv3-fp8 model on mi35x (#10104 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-09-07 20:45:17 -07:00
Jinyang Yuan	012584ecd5	perf: Avoid unnecessary data type conversions for DeepSeek-V3 on Blackwell (#9834 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-09-06 14:06:46 +08:00
Elfie Guo	bebd0576e5	Integrate trtllm ragged attention for prefill self-attention (#9801 )	2025-09-05 17:18:00 +08:00
kk	918e3d4c27	Fix accuracy drop of dsv3 run in dp enablement (#8677 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2025-09-04 16:51:16 -07:00
kk	e96973742c	Optimized deepseek-v3/r1 model performance on mxfp4 run (#10008 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>	2025-09-04 15:11:22 -07:00
Elfie Guo	66d5d0425c	Minor update regarding issue #9704 (#9733 )	2025-09-03 16:52:07 -07:00
Yineng Zhang	1b2ff4fb7f	Revert "Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671 )" (#9959 )	2025-09-03 00:50:04 -07:00
kk	0dfd54d11d	Optimized deepseek-v3/r1 model performance on mxfp4 run (#9671 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: wghuang <wghuang@amd.com>	2025-09-02 22:26:28 -07:00
chenxj	d4a938417d	[feat] Support tp mode for DeepSeek-R1-W4AFP8 (#8118 ) Co-authored-by: yuhyao <827623970@qq.com>	2025-09-01 22:17:26 -07:00
Yineng Zhang	8abe8deae6	fix: dsv3 lite q_lora_rank none (#9815 )	2025-08-29 23:24:14 -07:00
chenxu140	74dd4249ac	[Feature] Support NPUGraph for DeepSeek on Ascend NPU (#9355 ) Co-authored-by: Even Zhou <even.y.zhou@outlook.com>	2025-08-28 16:06:24 -07:00
Lianmin Zheng	fd71b11b1d	move is_sm90_supported/is_sm100_supported to python/sglang/srt/utils.py (#9679 )	2025-08-27 03:34:29 -07:00
ZhengdQin	f92b729d52	[new feat] ascend backend support fia fusion kernel (#8328 ) Co-authored-by: Even Zhou <even.y.zhou@outlook.com>	2025-08-25 23:13:08 -07:00
fzyzcjy	2600fc0d47	Overlapped weight offload (#8034 )	2025-08-23 02:06:46 -07:00

1 2 3 4 5 ...

303 Commits