sglang

Author	SHA1	Message	Date
Lianmin Zheng	9cd9dc83b3	Temporarily disable unit test of torch native attention backend (#2492 )	2024-12-16 14:17:27 -08:00
Ying Sheng	8586b72da0	[feat] Enable chunked prefill for llava-onevision (#2412 )	2024-12-09 09:52:38 -08:00
Lianmin Zheng	641b7d0ae0	[Minor] Improve code style (#2422 )	2024-12-09 06:30:35 -08:00
Xiaoyu Zhang	3844feb9bb	Add a unittest for fused_moe (#2416 )	2024-12-08 22:46:10 -08:00
Ying Sheng	aa47f64223	Revert "[feat] Enable chunked prefill for llava-onevision" (#2329 )	2024-12-02 23:11:13 -08:00
Ying Sheng	480e38a733	[feat] Enable chunked prefill for llava-onevision (#2281 )	2024-12-02 20:19:02 -08:00
Qun Yang	62c516ac45	Add a simple torch native attention backend (#2241 )	2024-12-01 03:01:25 -08:00
Lianmin Zheng	9449a95431	[CI] Balance CI tests (#2293 )	2024-12-01 01:47:30 -08:00
Lianmin Zheng	0303ca918f	[CI] Fix missing files in run_suite.py (#2288 )	2024-11-30 23:53:34 -08:00
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Chayenne	7d1485d376	Add get weights by parameter name for llama (#2266 )	2024-11-29 23:36:38 -08:00
Lianmin Zheng	b2ccf36d4d	Fix memory leak during abort (#2238 )	2024-11-28 02:22:15 -08:00
Ying Sheng	37c8a5761f	[feat] Support session control for vision language models (#2210 )	2024-11-27 00:03:29 -08:00
Lianmin Zheng	1605ae121e	[CI] Minor fix for CI (#2187 )	2024-11-25 16:38:43 -08:00
Rin Intachuen	1aea19f64b	Input_embeds support (#2052 )	2024-11-25 16:35:04 -08:00
Ying Sheng	e1e595d702	[feat] Refactor session control interface and add CI (#2173 )	2024-11-25 12:32:51 -08:00
Lianmin Zheng	254fd130e2	[CI] Split test cases in CI for better load balancing (#2180 )	2024-11-25 04:58:16 -08:00
Lianmin Zheng	7d671e4ad2	Enable overlap by default (#2067 )	2024-11-19 22:07:58 -08:00
Lianmin Zheng	c3eac1b010	Fix torch.compile for MoE (#2033 )	2024-11-14 01:30:24 -08:00
Lianmin Zheng	9c939a3d8b	Clean up metrics code (#1972 )	2024-11-09 15:43:20 -08:00
Lianmin Zheng	2ce32db6fb	Let reward model take text inputs instead of message lists (#1907 ) Co-authored-by: Kyle Corbitt <kyle@corbt.com>	2024-11-03 13:27:12 -08:00
Lianmin Zheng	c17c578108	Simplify tokenizer manager (#1904 )	2024-11-03 08:38:26 -08:00
Lianmin Zheng	a2e0424abf	Fix memory leak for chunked prefill 2 (#1858 ) Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2024-10-31 14:51:51 -07:00
Ke Bao	c77762d57f	Fix Triton decode kernel & ut (#1819 )	2024-10-27 10:54:38 -07:00
Lianmin Zheng	e646c5901e	Fix logprob in the overlapped mode (#1795 )	2024-10-25 11:06:57 -07:00
Lianmin Zheng	40900baea7	[Fix] Fix the log parsing in chunked prefill uni tests (#1794 )	2024-10-25 08:31:08 -07:00
Liangsheng Yin	a2f5e7555f	Fix memory leak when doing chunked prefill (#1787 )	2024-10-25 08:01:17 -07:00
Lianmin Zheng	1701b0db31	Enhance the test case for chunked prefill (#1785 )	2024-10-24 21:23:09 -07:00
Lianmin Zheng	00611286a1	Fix sliding window attention and gemma-2 unit tests in CI (#1746 )	2024-10-21 13:47:12 -07:00
Lianmin Zheng	9116b2896f	Add a new event loop (#1677 )	2024-10-16 01:33:20 -07:00
Shuo Yang	061e546313	Support double sparsity (#1459 )	2024-10-14 02:00:41 -07:00
Lianmin Zheng	869f1c02c4	Add a test case to test retract (#1662 )	2024-10-13 20:32:37 -07:00
Byron Hsu	862cd265e5	[engine] support async and streaming (#1614 )	2024-10-11 15:26:25 -07:00
Byron Hsu	551a3a9d38	Provide an offline engine API (#1567 )	2024-10-06 20:27:03 -07:00
Ying Sheng	0f4fb19bc8	[Fix, LoRA] fix LoRA with updates in main (#1545 )	2024-09-30 10:06:08 -07:00
Lianmin Zheng	3f0fe08d37	Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541 )	2024-09-29 20:28:45 -07:00
Ying Sheng	9aa6553d2a	[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525 )	2024-09-27 23:32:11 -07:00
Lianmin Zheng	9ba1f09760	[Fix] Fix logprob and normalized_logprob (#1428 )	2024-09-15 06:36:06 -07:00
Ke Bao	33b54e7c40	Add pytorch sampling backend ut (#1425 )	2024-09-15 01:15:30 +10:00
Ying Sheng	eb02c1618a	[Minor, CI] remove lora test from minimal suite (#1406 )	2024-09-12 16:49:50 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00
Kai-Hsun Chen	c9b75917d5	[server] Passing `model_override_args` to `launch_server` via the CLI. (#1298 ) Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2024-09-09 02:14:25 -07:00
havetc	9935f97b3e	[FEAT] JSON constrained support (#1125 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-26 09:37:26 -07:00
Mingyi	97589a60a2	[CI] Parallelize unit tests in CI (#1219 )	2024-08-26 04:54:02 +00:00
Mingyi	7514b9f8d3	[CI] Fix CI (#1217 )	2024-08-26 02:56:42 +00:00
Ying Sheng	308d024092	[CI] Fix the issue of unit test hanging (#1211 )	2024-08-25 16:21:37 -07:00
Ying Sheng	ab4990e4bf	[Minor] Temporarily skip flaky test (#1209 )	2024-08-25 14:49:23 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00

1 2

62 Commits