sglang

Author	SHA1	Message	Date
yizhang2077	3900a94afe	Support twoshot kernel (#2688 )	2025-01-06 00:47:16 +08:00
Xiaoyu Zhang	ded9fcd09a	improve moe_align_kernel for deepseek v3 (#2735 )	2025-01-06 00:28:22 +08:00
Yineng Zhang	bc6ad367c2	fix lint (#2733 )	2025-01-05 14:45:42 +08:00
Lianmin Zheng	3a22a303d1	Revert the GLOO_SOCKET_IFNAME change (#2731 )	2025-01-04 20:13:16 -08:00
libra	bdb3929dbb	Refactor SchedulePolicy to improve code organization (#2571 )	2025-01-04 00:05:16 +08:00
Ce Gao	f5d0865b25	feat: Support VLM in reference_hf (#2726 ) Signed-off-by: Ce Gao <gaocegege@hotmail.com>	2025-01-03 22:32:30 +08:00
Ce Gao	afdee7b1a9	[Docs] fix 404 - Contributor Guide, again (#2727 ) Signed-off-by: Ce Gao <gaocegege@hotmail.com>	2025-01-03 22:21:38 +08:00
Lianmin Zheng	cb34d848ac	Update README.md (#2722 ) Co-authored-by: Yangmin Li <2682000734@qq.com> Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu> Co-authored-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-01-03 00:32:20 -08:00
Lianmin Zheng	0f9cc6d8d3	Fix package loss for small models (#2717 ) Co-authored-by: sdli1995 < mmlmonkey@163.com>	2025-01-02 18:25:26 -08:00
yigex	c7ae474a49	[Feature, Hardware] Enable DeepseekV3 on AMD GPUs (#2601 ) Co-authored-by: root <root@banff-cyxtera-s83-5.amd.com> Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Bruce Xue <yigex@xilinx.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-01-02 16:23:19 -08:00
Lianmin Zheng	bdf946bf81	Support loading pre-sharded moe weights (#2716 )	2025-01-02 15:07:37 -08:00
yukavio	8c8779cd05	[Fix] fix retract error in eagle speculative decoding (#2711 ) Co-authored-by: kavioyu <kavioyu@tencent.com>	2025-01-02 10:28:39 -08:00
Mick	1775b963db	[Fix] fix incorrectly overwriting the port specified in ServerArgs (#2714 )	2025-01-02 10:28:22 -08:00
Shi Shuai	dd2e2d275f	Docs: Update documentation workflow and contribution guide (#2704 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-01-02 09:18:31 -08:00
Rodrigo Garcia	a990daff9c	Included multi-node DeepSeekv3 example (#2707 )	2025-01-02 22:17:03 +08:00
Yineng Zhang	ba5112ff69	feat: support moe_align_block_size_triton (#2712 ) Co-authored-by: WANDY666 <1060304770@qq.com>	2025-01-02 21:47:44 +08:00
yukavio	815dce0554	Eagle speculative decoding part 4: Add EAGLE2 worker (#2150 ) Co-authored-by: kavioyu <kavioyu@tencent.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-01-02 03:22:34 -08:00
Lianmin Zheng	ad20b7957e	Eagle speculative decoding part 3: small modifications to the general scheduler (#2709 ) Co-authored-by: kavioyu <kavioyu@tencent.com>	2025-01-02 02:09:08 -08:00
fzyzcjy	9183c23eca	Speed up `update_weights_from_tensor` (#2695 )	2025-01-02 02:05:19 -08:00
kk	148254d4db	Improve moe reduce sum kernel performance (#2705 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2025-01-02 01:11:06 -08:00
Xiaotong Jiang	a4d6d6f1dd	[feat]: Add math eval to CI nightly run (#2663 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-01-01 15:29:35 -08:00
Shi Shuai	062c48d2bd	[Docs] Add Support for Pydantic Structured Output Format (#2697 )	2025-01-01 15:08:43 -08:00
kk	b6e0cfb5e1	ROCm base image update (#2692 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2025-01-01 12:12:19 +08:00
Chayenne	0d8d97b8e6	Doc: Rename contribution_guide.md (#2691 )	2024-12-31 14:35:48 -08:00
Shi Shuai	0a765bbccc	Docs: Refactor Contribution Guide (#2690 )	2024-12-31 14:11:00 -08:00
Xiaoyu Zhang	286cad3ee3	h200 tuning fused_moe_triton config for Mixtral 8x7B/8x22B and Qwen2 57BA14B (#2689 )	2024-12-31 23:17:36 +08:00
Ying Sheng	dc7eb01f19	[Fix] fix openai adapter (#2685 )	2024-12-31 10:48:19 +00:00
Lianmin Zheng	b0524c3789	Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684 ) Co-authored-by: yukavio <kavioyu@gmail.com>	2024-12-31 02:25:05 -08:00
Lianmin Zheng	6c42fa229d	Update README.md (#2683 )	2024-12-31 00:13:10 -08:00
Yineng Zhang	d49b13c6f8	feat: use CUDA 12.4 by default (for FA3) (#2682 )	2024-12-31 15:52:09 +08:00
Yineng Zhang	bedc4c7a50	misc: update CODEOWNERS (#2680 )	2024-12-31 15:04:50 +08:00
Lianmin Zheng	f44d143949	Support target model verification in the attention backend (#2678 ) Co-authored-by: yukavio <kavioyu@gmail.com>	2024-12-30 22:58:55 -08:00
Yineng Zhang	b6b57fc200	minor: cleanup sgl-kernel (#2679 )	2024-12-31 14:52:00 +08:00
Ke Bao	b4403985d0	Add cutlass submodule for sgl-kernel (#2676 )	2024-12-31 14:28:29 +08:00
Lianmin Zheng	339c69a243	Improve the computation for time_per_output_token Prometheus metrics (#2674 )	2024-12-30 21:40:14 -08:00
fzyzcjy	f707470019	CI: Update scripts to fail fast (#2672 )	2024-12-30 19:04:01 -08:00
Lianmin Zheng	21ec66e59e	Minor follow-up fixes for the logprob refactor (#2670 )	2024-12-30 05:42:08 -08:00
HAI	c5210dfa38	AMD DeepSeek_V3 FP8 Numerical fix (#2667 )	2024-12-30 21:31:12 +08:00
mobicham	a29dd9501d	Add GemLite caching after each capture (#2669 )	2024-12-30 05:27:29 -08:00
Lianmin Zheng	9c6ba2484f	Refactor logprob computation to return the real logprob used in sampling (#2664 )	2024-12-30 04:51:38 -08:00
Ke Bao	b02da24a5b	Refactor sgl-kernel build (#2642 )	2024-12-30 18:07:01 +08:00
Lianmin Zheng	bdd2827a80	Update structured_outputs.ipynb (#2666 )	2024-12-30 00:46:41 -08:00
Lianmin Zheng	8c3b420eec	[Docs] clean up structured outputs docs (#2654 )	2024-12-29 23:57:16 -08:00
HAI	e6f523b5f2	fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655 )	2024-12-29 23:45:02 -08:00
Lianmin Zheng	3231817861	Revert "[feat] Add math eval to CI" (#2656 )	2024-12-30 15:05:50 +08:00
Xiaotong Jiang	a11f8d5f6a	[feat] Add math eval to CI (#2652 )	2024-12-30 14:49:41 +08:00
Yineng Zhang	098d659c0e	docs: update README (#2651 )	2024-12-30 13:33:29 +08:00
Lzhang-hub	76d14f8cb9	add 2*h20 node serving example for deepseek v3 (#2650 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-12-30 13:04:38 +08:00
Lianmin Zheng	b08c308ebc	Update the timeout in nightly-test.yml (#2649 )	2024-12-29 14:51:07 -08:00
Lianmin Zheng	03d5fbfd44	Release 0.4.1.post3 - upload the config.json to PyPI (#2647 )	2024-12-29 14:25:53 -08:00

1 2 3 4 5 ...

1643 Commits