sglang

Author	SHA1	Message	Date
Mick	9f635ea50d	[Fix] Address remaining issues of supporting MiniCPMV (#2977 )	2025-01-28 00:22:13 -08:00
Byron Hsu	27aeb4b7d8	[test] deduplicate test_session_control (#3183 )	2025-01-28 13:17:06 +08:00
yizhang2077	1e3e521544	add unit test for block wise fp8 (#3156 )	2025-01-27 15:32:04 +08:00
Lianmin Zheng	d1a0863251	Add a test case for cached_tokens (#3145 )	2025-01-26 01:39:28 -08:00
Lianmin Zheng	cd493b5afc	Improve metrics, logging, and importing orders (#2992 )	2025-01-19 18:36:59 -08:00
Enrique Shockwave	3bcf5ecea7	support regex in xgrammar backend (#2983 )	2025-01-20 04:34:41 +08:00
bjmsong	d3024f4fc8	support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894 ) Co-authored-by: bjmsong <bjmsong@126.com>	2025-01-18 11:43:22 +08:00
Ke Bao	d47c5101f1	Add ut for qwen model (#2947 )	2025-01-18 00:03:54 +08:00
Chang Su	a8ccacc8b8	[Frontend] Fix request length check and add option to disallow auto truncation in scheduler (#2876 )	2025-01-16 14:51:19 -08:00
yizhang2077	767c9dec03	adapt custom allreduce for tensorrt llm (#2511 )	2025-01-16 04:57:35 +08:00
Ke Bao	bfbda62c8b	Add ut for w8a8 int8 quantization (#2897 )	2025-01-15 18:29:14 +08:00
fzyzcjy	923f518337	CUDA-graph-compatible releasing and resuming KV cache and model weight memory (#2630 )	2025-01-13 11:38:51 -08:00
Lianmin Zheng	67008f4b32	Use only one GPU for MLA CI tests (#2858 )	2025-01-13 03:55:33 -08:00
Shi Shuai	c4f9707e16	Improve: Token-In Token-Out Usage for RLHF (#2843 )	2025-01-11 15:14:26 -08:00
Lianmin Zheng	b22f3f6475	Fix nightly accuracy tests (#2780 )	2025-01-07 21:02:35 -08:00
Lianmin Zheng	6fb5768372	Disable math eval on nightly CI temporarily (#2779 )	2025-01-07 18:17:34 -08:00
yukavio	815dce0554	Eagle speculative decoding part 4: Add EAGLE2 worker (#2150 ) Co-authored-by: kavioyu <kavioyu@tencent.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-01-02 03:22:34 -08:00
Xiaotong Jiang	a4d6d6f1dd	[feat]: Add math eval to CI nightly run (#2663 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-01-01 15:29:35 -08:00
Shi Shuai	35bdb48557	[Feature] Get Token IDs with Engine.generate() (#2636 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2024-12-29 12:28:27 -08:00
fzyzcjy	fd28640dc5	Add `update_weights_from_tensor` (#2631 )	2024-12-28 13:30:27 -08:00
Lianmin Zheng	855d0ba381	[CI] Fix nightly test and raise better error message (#2626 ) Co-authored-by: Sangbin <rkooo567@gmail.com>	2024-12-27 22:16:39 -08:00
Lianmin Zheng	dc3bee4815	Fix test and benchmark scripts (#2598 )	2024-12-26 07:56:26 -08:00
Lianmin Zheng	9cd9dc83b3	Temporarily disable unit test of torch native attention backend (#2492 )	2024-12-16 14:17:27 -08:00
Ying Sheng	8586b72da0	[feat] Enable chunked prefill for llava-onevision (#2412 )	2024-12-09 09:52:38 -08:00
Lianmin Zheng	641b7d0ae0	[Minor] Improve code style (#2422 )	2024-12-09 06:30:35 -08:00
Xiaoyu Zhang	3844feb9bb	Add a unittest for fused_moe (#2416 )	2024-12-08 22:46:10 -08:00
Ying Sheng	aa47f64223	Revert "[feat] Enable chunked prefill for llava-onevision" (#2329 )	2024-12-02 23:11:13 -08:00
Ying Sheng	480e38a733	[feat] Enable chunked prefill for llava-onevision (#2281 )	2024-12-02 20:19:02 -08:00
Qun Yang	62c516ac45	Add a simple torch native attention backend (#2241 )	2024-12-01 03:01:25 -08:00
Lianmin Zheng	9449a95431	[CI] Balance CI tests (#2293 )	2024-12-01 01:47:30 -08:00
Lianmin Zheng	0303ca918f	[CI] Fix missing files in run_suite.py (#2288 )	2024-11-30 23:53:34 -08:00
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Chayenne	7d1485d376	Add get weights by parameter name for llama (#2266 )	2024-11-29 23:36:38 -08:00
Lianmin Zheng	b2ccf36d4d	Fix memory leak during abort (#2238 )	2024-11-28 02:22:15 -08:00
Ying Sheng	37c8a5761f	[feat] Support session control for vision language models (#2210 )	2024-11-27 00:03:29 -08:00
Lianmin Zheng	1605ae121e	[CI] Minor fix for CI (#2187 )	2024-11-25 16:38:43 -08:00
Rin Intachuen	1aea19f64b	Input_embeds support (#2052 )	2024-11-25 16:35:04 -08:00
Ying Sheng	e1e595d702	[feat] Refactor session control interface and add CI (#2173 )	2024-11-25 12:32:51 -08:00
Lianmin Zheng	254fd130e2	[CI] Split test cases in CI for better load balancing (#2180 )	2024-11-25 04:58:16 -08:00
Lianmin Zheng	7d671e4ad2	Enable overlap by default (#2067 )	2024-11-19 22:07:58 -08:00
Lianmin Zheng	c3eac1b010	Fix torch.compile for MoE (#2033 )	2024-11-14 01:30:24 -08:00
Lianmin Zheng	9c939a3d8b	Clean up metrics code (#1972 )	2024-11-09 15:43:20 -08:00
Lianmin Zheng	2ce32db6fb	Let reward model take text inputs instead of message lists (#1907 ) Co-authored-by: Kyle Corbitt <kyle@corbt.com>	2024-11-03 13:27:12 -08:00
Lianmin Zheng	c17c578108	Simplify tokenizer manager (#1904 )	2024-11-03 08:38:26 -08:00
Lianmin Zheng	a2e0424abf	Fix memory leak for chunked prefill 2 (#1858 ) Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2024-10-31 14:51:51 -07:00
Ke Bao	c77762d57f	Fix Triton decode kernel & ut (#1819 )	2024-10-27 10:54:38 -07:00
Lianmin Zheng	e646c5901e	Fix logprob in the overlapped mode (#1795 )	2024-10-25 11:06:57 -07:00
Lianmin Zheng	40900baea7	[Fix] Fix the log parsing in chunked prefill uni tests (#1794 )	2024-10-25 08:31:08 -07:00

1 2

84 Commits