sglang

Author	SHA1	Message	Date
Shi Shuai	0a765bbccc	Docs: Refactor Contribution Guide (#2690 )	2024-12-31 14:11:00 -08:00
Xiaoyu Zhang	286cad3ee3	h200 tuning fused_moe_triton config for Mixtral 8x7B/8x22B and Qwen2 57BA14B (#2689 )	2024-12-31 23:17:36 +08:00
Ying Sheng	dc7eb01f19	[Fix] fix openai adapter (#2685 )	2024-12-31 10:48:19 +00:00
Lianmin Zheng	b0524c3789	Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684 ) Co-authored-by: yukavio <kavioyu@gmail.com>	2024-12-31 02:25:05 -08:00
Lianmin Zheng	6c42fa229d	Update README.md (#2683 )	2024-12-31 00:13:10 -08:00
Yineng Zhang	d49b13c6f8	feat: use CUDA 12.4 by default (for FA3) (#2682 )	2024-12-31 15:52:09 +08:00
Yineng Zhang	bedc4c7a50	misc: update CODEOWNERS (#2680 )	2024-12-31 15:04:50 +08:00
Lianmin Zheng	f44d143949	Support target model verification in the attention backend (#2678 ) Co-authored-by: yukavio <kavioyu@gmail.com>	2024-12-30 22:58:55 -08:00
Yineng Zhang	b6b57fc200	minor: cleanup sgl-kernel (#2679 )	2024-12-31 14:52:00 +08:00
Ke Bao	b4403985d0	Add cutlass submodule for sgl-kernel (#2676 )	2024-12-31 14:28:29 +08:00
Lianmin Zheng	339c69a243	Improve the computation for time_per_output_token Prometheus metrics (#2674 )	2024-12-30 21:40:14 -08:00
fzyzcjy	f707470019	CI: Update scripts to fail fast (#2672 )	2024-12-30 19:04:01 -08:00
Lianmin Zheng	21ec66e59e	Minor follow-up fixes for the logprob refactor (#2670 )	2024-12-30 05:42:08 -08:00
HAI	c5210dfa38	AMD DeepSeek_V3 FP8 Numerical fix (#2667 )	2024-12-30 21:31:12 +08:00
mobicham	a29dd9501d	Add GemLite caching after each capture (#2669 )	2024-12-30 05:27:29 -08:00
Lianmin Zheng	9c6ba2484f	Refactor logprob computation to return the real logprob used in sampling (#2664 )	2024-12-30 04:51:38 -08:00
Ke Bao	b02da24a5b	Refactor sgl-kernel build (#2642 )	2024-12-30 18:07:01 +08:00
Lianmin Zheng	bdd2827a80	Update structured_outputs.ipynb (#2666 )	2024-12-30 00:46:41 -08:00
Lianmin Zheng	8c3b420eec	[Docs] clean up structured outputs docs (#2654 )	2024-12-29 23:57:16 -08:00
HAI	e6f523b5f2	fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655 )	2024-12-29 23:45:02 -08:00
Lianmin Zheng	3231817861	Revert "[feat] Add math eval to CI" (#2656 )	2024-12-30 15:05:50 +08:00
Xiaotong Jiang	a11f8d5f6a	[feat] Add math eval to CI (#2652 )	2024-12-30 14:49:41 +08:00
Yineng Zhang	098d659c0e	docs: update README (#2651 )	2024-12-30 13:33:29 +08:00
Lzhang-hub	76d14f8cb9	add 2*h20 node serving example for deepseek v3 (#2650 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-12-30 13:04:38 +08:00
Lianmin Zheng	b08c308ebc	Update the timeout in nightly-test.yml (#2649 )	2024-12-29 14:51:07 -08:00
Lianmin Zheng	03d5fbfd44	Release 0.4.1.post3 - upload the config.json to PyPI (#2647 )	2024-12-29 14:25:53 -08:00
Chayenne	1703d766d8	CI: skip special token for engine token ids unit test (#2648 )	2024-12-29 13:52:50 -08:00
zhaochenyang20	09e6e2aa33	Merge branch 'main' of github.com:sgl-project/sglang	2024-12-29 21:48:21 +00:00
Shi Shuai	fad29f7f52	CI: Fix unittest for engine input token ids and output token ids (#2646 )	2024-12-29 13:28:59 -08:00
Shi Shuai	35bdb48557	[Feature] Get Token IDs with Engine.generate() (#2636 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2024-12-29 12:28:27 -08:00
Yineng Zhang	b085e06b01	docs: add development guide using docker (#2645 )	2024-12-30 02:22:54 +08:00
Yineng Zhang	763dd55d17	docs: update README (#2644 )	2024-12-30 01:24:06 +08:00
Yineng Zhang	3ccf566b0d	chore: bump v0.4.1.post2 (#2643 )	2024-12-30 00:11:46 +08:00
HandH1998	afa0341e57	Update Triton configs for block fp8 kernels (#2641 )	2024-12-29 22:53:47 +08:00
HAI	30828e7192	AMD: set weights and scaling numbers properly for block FP8 (#2637 )	2024-12-29 03:23:39 -08:00
Ying Sheng	e0e09fceeb	[Session] Update session control interface (#2635 )	2024-12-29 02:10:27 -08:00
Lianmin Zheng	9c05c6898e	Add llama_eagle.py (#2640 ) Co-authored-by: kavioyu <kavioyu@tencent.com>	2024-12-29 01:45:35 -08:00
Yineng Zhang	3464e57b62	minor: add nsys cli for docker dev (#2639 )	2024-12-29 17:28:11 +08:00
Lianmin Zheng	3815b23ccb	Clean up wrapper in flashinfer backend (#2638 )	2024-12-29 00:45:57 -08:00
Adarsh Shirawalmath	fd34f2da35	[Docs] Add EBNF to sampling params docs (#2609 )	2024-12-29 00:05:00 -08:00
Tanjiro	8ee9a8501a	[Feature] Function Calling (#2544 ) Co-authored-by: Haoyu Wang <120358163+HaoyuWang4188@users.noreply.github.com>	2024-12-28 21:58:52 -08:00
fzyzcjy	fd28640dc5	Add `update_weights_from_tensor` (#2631 )	2024-12-28 13:30:27 -08:00
Yineng Zhang	7863e4368a	add configs for block fp8 related kernels (#2628 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-28 23:12:04 +08:00
Shi Shuai	333e3bfde5	[docs]Refactor constrained decoding tutorial (#2633 )	2024-12-28 07:00:38 -08:00
Shi Shuai	239c9d4d3a	Docs: Add constrained decoding tutorial (#2614 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2024-12-27 23:54:28 -08:00
Lianmin Zheng	855d0ba381	[CI] Fix nightly test and raise better error message (#2626 ) Co-authored-by: Sangbin <rkooo567@gmail.com>	2024-12-27 22:16:39 -08:00
Xiaoyu Zhang	9254a33ad4	avoid fused_moe_triton `padding` circular import (#2624 )	2024-12-28 14:01:35 +08:00
Ke Bao	8a2681e26a	Update readme (#2625 )	2024-12-28 13:39:56 +08:00
Lianmin Zheng	5276a675f5	Add more supporting organizations (#2623 )	2024-12-27 13:41:41 -08:00
Lianmin Zheng	751e5ca273	[minor] clean up docs and eos id (#2622 )	2024-12-27 11:23:46 -08:00

1 2 3 4 5 ...

1619 Commits