sglang

Author	SHA1	Message	Date
libra	bdb3929dbb	Refactor SchedulePolicy to improve code organization (#2571 )	2025-01-04 00:05:16 +08:00
Lianmin Zheng	0f9cc6d8d3	Fix package loss for small models (#2717 ) Co-authored-by: sdli1995 < mmlmonkey@163.com>	2025-01-02 18:25:26 -08:00
Shi Shuai	dd2e2d275f	Docs: Update documentation workflow and contribution guide (#2704 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-01-02 09:18:31 -08:00
yukavio	815dce0554	Eagle speculative decoding part 4: Add EAGLE2 worker (#2150 ) Co-authored-by: kavioyu <kavioyu@tencent.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-01-02 03:22:34 -08:00
fzyzcjy	9183c23eca	Speed up `update_weights_from_tensor` (#2695 )	2025-01-02 02:05:19 -08:00
Xiaotong Jiang	a4d6d6f1dd	[feat]: Add math eval to CI nightly run (#2663 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-01-01 15:29:35 -08:00
Lianmin Zheng	21ec66e59e	Minor follow-up fixes for the logprob refactor (#2670 )	2024-12-30 05:42:08 -08:00
Lianmin Zheng	9c6ba2484f	Refactor logprob computation to return the real logprob used in sampling (#2664 )	2024-12-30 04:51:38 -08:00
Lianmin Zheng	3231817861	Revert "[feat] Add math eval to CI" (#2656 )	2024-12-30 15:05:50 +08:00
Xiaotong Jiang	a11f8d5f6a	[feat] Add math eval to CI (#2652 )	2024-12-30 14:49:41 +08:00
Chayenne	1703d766d8	CI: skip special token for engine token ids unit test (#2648 )	2024-12-29 13:52:50 -08:00
Shi Shuai	fad29f7f52	CI: Fix unittest for engine input token ids and output token ids (#2646 )	2024-12-29 13:28:59 -08:00
Shi Shuai	35bdb48557	[Feature] Get Token IDs with Engine.generate() (#2636 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2024-12-29 12:28:27 -08:00
Ying Sheng	e0e09fceeb	[Session] Update session control interface (#2635 )	2024-12-29 02:10:27 -08:00
Tanjiro	8ee9a8501a	[Feature] Function Calling (#2544 ) Co-authored-by: Haoyu Wang <120358163+HaoyuWang4188@users.noreply.github.com>	2024-12-28 21:58:52 -08:00
fzyzcjy	fd28640dc5	Add `update_weights_from_tensor` (#2631 )	2024-12-28 13:30:27 -08:00
Lianmin Zheng	855d0ba381	[CI] Fix nightly test and raise better error message (#2626 ) Co-authored-by: Sangbin <rkooo567@gmail.com>	2024-12-27 22:16:39 -08:00
Lianmin Zheng	dc3bee4815	Fix test and benchmark scripts (#2598 )	2024-12-26 07:56:26 -08:00
Zhizhou Sha	a74d194146	[unittest] add unit test to test quant args of srt engine (#2574 )	2024-12-26 06:54:43 -08:00
Adarsh Shirawalmath	acb340728c	[Feature] Support new parameter - EBNF in xgrammar (#2526 )	2024-12-26 05:12:41 -08:00
Ke Bao	e835a50021	Reorg moe code (#2563 )	2024-12-24 01:10:22 +08:00
Lianmin Zheng	8496701934	[Misc] Fix metrics, weight update lock, request logging (#2543 )	2024-12-22 06:27:22 -08:00
Lianmin Zheng	9cd9dc83b3	Temporarily disable unit test of torch native attention backend (#2492 )	2024-12-16 14:17:27 -08:00
Ke Bao	2f9bd0fafd	Fix correctness issue for triton decoding kernel (#2479 )	2024-12-14 16:50:54 +08:00
Fred Reiss	993956c6b1	Add support for IBM Granite 3.x models (#2437 )	2024-12-11 06:30:23 -08:00
Ying Sheng	8586b72da0	[feat] Enable chunked prefill for llava-onevision (#2412 )	2024-12-09 09:52:38 -08:00
Lianmin Zheng	641b7d0ae0	[Minor] Improve code style (#2422 )	2024-12-09 06:30:35 -08:00
Xiaoyu Zhang	3844feb9bb	Add a unittest for fused_moe (#2416 )	2024-12-08 22:46:10 -08:00
Lianmin Zheng	a6ca736c8e	Simplify stream_output (#2398 )	2024-12-08 12:27:13 -08:00
Yineng Zhang	f62055b528	minor: add random flashinfer vs triton use case (#2409 )	2024-12-09 04:15:21 +08:00
Yineng Zhang	74bc9184c3	minor: add random use case (#2408 )	2024-12-09 03:21:35 +08:00
Yineng Zhang	0f8eb15323	feat: support custom task runner (#2407 )	2024-12-09 02:29:55 +08:00
Yineng Zhang	67470bbb28	minor: update correct measurement unit (#2406 )	2024-12-08 20:55:04 +08:00
Ke Bao	61dec545b0	Remove unused vars in the triton backend (#2401 )	2024-12-08 03:37:03 -08:00
Ke Bao	7dc66fcb40	Optimize Triton decoding kernel for long context (#2394 )	2024-12-08 01:17:37 -08:00
Lianmin Zheng	0e7409adb6	Fix the overlap for xgrammar (#2377 )	2024-12-06 05:49:29 -08:00
xiaobochen	3d32e4a32c	Resubmit MoE-EP (#2371 )	2024-12-06 15:05:21 +08:00
Lianmin Zheng	07ec07ad1f	Improve torch compile for fused moe (#2327 )	2024-12-03 01:58:25 -08:00
Ying Sheng	aa47f64223	Revert "[feat] Enable chunked prefill for llava-onevision" (#2329 )	2024-12-02 23:11:13 -08:00
Ying Sheng	480e38a733	[feat] Enable chunked prefill for llava-onevision (#2281 )	2024-12-02 20:19:02 -08:00
Lianmin Zheng	18108abe5d	[Minor] Fix code style (#2311 )	2024-12-02 02:27:36 -08:00
Chayenne	983bfcf386	Online weight updates from torch.distributed (#2279 )	2024-12-01 23:23:18 -08:00
Qun Yang	62c516ac45	Add a simple torch native attention backend (#2241 )	2024-12-01 03:01:25 -08:00
Lianmin Zheng	9449a95431	[CI] Balance CI tests (#2293 )	2024-12-01 01:47:30 -08:00
Lianmin Zheng	0303ca918f	[CI] Fix missing files in run_suite.py (#2288 )	2024-11-30 23:53:34 -08:00
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	1bfa511b95	[CI] Fix ci tests (#2284 )	2024-11-30 21:12:03 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Lianmin Zheng	ccaf1f997c	[CI] Print summary on github actions (#2274 )	2024-11-29 23:48:54 -08:00

1 2 3 4 5 ...

338 Commits