sglang

Author	SHA1	Message	Date
Kaixi Hou	5c34b4f1c7	[NVIDIA] [2/N] Optimize `silu_and_mul_scaled_fp4_grouped_quant` perf (#9556 )	2025-08-29 17:17:03 -07:00
pansicheng	09a1df2231	add bench_mix.py (#9788 )	2025-08-28 23:44:26 -07:00
Xinyuan Tong	f84b57c80e	Move git clone command up from README (#9740 )	2025-08-28 00:27:00 -07:00
Liangsheng Yin	d0934a5192	gpt-oss blog reproduction document (#9728 )	2025-08-28 10:15:08 +08:00
Yineng Zhang	bc80dc4ce0	chore: bump v0.5.1.post3 (#9716 )	2025-08-27 15:42:42 -07:00
ehuaa	8f7b1c31e8	Add A100 fused MoE kernel configs for Dpsk (#9677 )	2025-08-26 20:49:48 -07:00
hzh0425	c04c17edfa	refactor(hicache): Introduce generic HiCacheStorageConfig for improved configuration management (#9555 ) Co-authored-by: Teng Ma <805522925@qq.com>	2025-08-26 17:55:20 -07:00
Yineng Zhang	e3e97a120b	chore: bump v0.5.1.post2 (#9592 )	2025-08-25 03:45:09 -07:00
Yineng Zhang	f8b757bcac	fix: resolve tuning fused moe issue (#9587 )	2025-08-25 01:41:15 -07:00
Yineng Zhang	e0ab167db0	chore: bump v0.5.1.post1 (#9558 )	2025-08-24 01:14:17 -07:00
Xiaotong Jiang	80425e59bb	[doc] deepseekv31 support (#9544 )	2025-08-23 16:54:58 -07:00
Lianmin Zheng	97a38ee85b	Release 0.5.1 (#9533 )	2025-08-23 07:09:26 -07:00
hzh0425	83871aa12d	feat(hicache): Supports 3fs-hicache compatibility with dp-attention (#9372 )	2025-08-23 02:08:32 -07:00
yuxingcyx	4edbe0d534	[benchmark] Add benchmark scripts for ceval and boolq (#8946 ) Co-authored-by: chenyuxing <2818499974@qq.com> Co-authored-by: hanqing <huang010706@126.com> Co-authored-by: Muggle <62579327+trawolf@users.noreply.github.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>	2025-08-23 15:40:15 +08:00
pansicheng	70cf4abccc	3fs zerocopy (#9109 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-22 17:56:38 +08:00
Even Zhou	de2dd73831	Revert "[feature] Rework Ascend NPU graph support" (#9385 )	2025-08-20 00:35:10 -07:00
Even Zhou	3680d6f88b	[feature] Rework Ascend NPU graph support (#9350 ) Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com> Co-authored-by: anon189Ty <Stari_Falcon@outlook.com> Co-authored-by: Maksim <makcum888e@mail.ru> Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>	2025-08-19 20:32:27 -07:00
Chang Su	46fe8b8cb2	[CI] Fix lint issues (#9361 )	2025-08-19 13:05:36 -07:00
mpashkovskiy	a3b810ebdb	fix: enable multi-GPU Triton fused MoE tuning (#6295 )	2025-08-19 10:16:58 -07:00
Even Zhou	f4fafacc5d	Revert "[feature] Ascend NPU graph support (#8027 )" (#9348 )	2025-08-19 10:11:23 -07:00
Binyao Jiang	c2fbf60f39	[GLM4.1V and GLM4.5V] Add vision transformer num_dummy_head support: max tp=4 -> max tp=8 (#9059 )	2025-08-18 14:40:13 -07:00
Yuan Luo	968e181826	Fix triton_fused_moe unit test and benchmark (#9276 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-08-18 00:54:33 -07:00
VDV1985	94371dbbd6	[feature] Ascend NPU graph support (#8027 ) Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com> Co-authored-by: anon189Ty <Stari_Falcon@outlook.com> Co-authored-by: Maksim <makcum888e@mail.ru> Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>	2025-08-16 17:25:17 -07:00
Cheng Wan	295895120d	[6/N] MoE Refactor: Cleanup MoE-related configs (#8849 )	2025-08-14 21:14:53 -07:00
Yineng Zhang	fab0f6e77d	chore: bump v0.5.0rc2 (#9203 )	2025-08-14 16:11:16 -07:00
Sundara Raman Ramachandran	a027a9b4b3	[Generative Score API] Optimization to Remove Decode. (#8840 )	2025-08-14 05:12:24 +08:00
Yineng Zhang	7b56e494be	chore: bump v0.5.0rc1 (#9069 )	2025-08-13 10:44:14 -07:00
Zhiqiang Xie	0eec4cb6cc	HiCache, add bench long context plus minor fixs (#9086 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 16:54:52 -07:00
Lianmin Zheng	b58ae7a2a0	Simplify frontend language (#9029 )	2025-08-10 10:59:30 -07:00
Binyao Jiang	f29aba8c6e	Support glm4.1v and glm4.5v (#8798 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Chang Su <csu272@usc.edu>	2025-08-09 00:59:13 -07:00
Yineng Zhang	9020f7fc32	chore: bump v0.5.0rc0 (#8959 )	2025-08-08 09:16:18 -07:00
pansicheng	e2fd2b9c7e	Simple prefetch policy (#8692 )	2025-08-08 02:09:28 -07:00
eigen	9c7e392465	bench: add attention sink op benchmark, triton and trtllm-gen [B200] (#8932 ) Co-authored-by: averyhuang <averyh@nvidia.com>	2025-08-08 00:16:23 -07:00
Ke Bao	0475448ee3	Optimize triton swa kernel by skipping computation (#8860 )	2025-08-06 21:37:50 +08:00
Yineng Zhang	8cd344586e	chore: bump v0.4.10.post2 (#8727 )	2025-08-03 03:43:29 -07:00
Ke Bao	33f0de337d	chore: bump v0.4.10.post1 (#8652 )	2025-08-01 12:07:30 +08:00
Yineng Zhang	023288645b	chore: bump v0.4.10 (#8608 )	2025-07-31 20:50:17 +08:00
pansicheng	299803343d	Add hf3fs support for hicache storage (based on #7704 ) (#7280 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-07-30 17:42:41 -07:00
Yineng Zhang	6478831be9	chore: bump v0.4.9.post6 (#8517 )	2025-07-29 02:30:07 -07:00
Yineng Zhang	1466c1b896	feat: support glm4 tuning (#8473 )	2025-07-28 14:32:58 -07:00
Yineng Zhang	45bc170b36	chore: bump v0.4.9.post5 (#8458 )	2025-07-28 02:11:06 -07:00
Yuxuan Zhang	6d6a8bc278	GLM-4.5 Model Support (#8224 ) Co-authored-by: Lifu Huang <lifu.hlf@gmail.com> Co-authored-by: Binyao Jiang <byjiang1996@gmail.com> Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-07-27 22:54:07 -07:00
fzyzcjy	62222bd27e	Minor tool for comparison of benchmark results (#7974 )	2025-07-27 00:27:50 -07:00
Mick	4fa44d63c6	chore: improve mmmu benchmark (#7000 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-07-26 16:19:45 +08:00
Yineng Zhang	2272c2a5b5	chore: bump v0.4.9.post4 (#8305 )	2025-07-25 17:12:47 -07:00
Zhiqiang Xie	ce86e201df	bug fix and tag (#8282 )	2025-07-23 16:50:31 +08:00
Yineng Zhang	01c000043c	chore: bump v0.4.9.post3 (#8265 )	2025-07-22 15:55:48 -07:00
zhongwei	ff45ab7a5f	[Benchmark] add disable-auto-run param for hicache/bench_multiturn (#7822 ) Co-authored-by: zhongwei.ren <zhongwei.ren@bytedance.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-07-22 14:02:40 -07:00
Cheng Wan	abda2542d5	Fix tuning_fused_moe_triton.py (#8175 )	2025-07-19 17:33:50 -07:00
Hongbo Xu	1f76fc8747	[3/n] chore: decouple AWQ implementation from vLLM dependency (#8113 ) Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>	2025-07-18 11:45:22 -07:00

1 2 3 4 5 ...

309 Commits