sglang

Author	SHA1	Message	Date
Liangsheng Yin	94cde10920	Llama3.2 vision model support (#1551 )	2024-10-21 15:01:21 -07:00
Lianmin Zheng	00611286a1	Fix sliding window attention and gemma-2 unit tests in CI (#1746 )	2024-10-21 13:47:12 -07:00
Lianmin Zheng	7ce3606891	Faster overlap mode scheduler (#1738 )	2024-10-21 04:30:52 -07:00
Liangsheng Yin	efb099cdee	Fix prefill oom (#1743 )	2024-10-21 03:54:35 -07:00
Lianmin Zheng	09603c6dc9	Maintain seq_lens_sum to make more FlashInfer operations non-blocking (#1741 )	2024-10-21 01:43:16 -07:00
Lianmin Zheng	cf470fea32	Make token mapping non-blocking in the overlapped mode (#1740 )	2024-10-20 23:25:14 -07:00
sixgod	45d5af2416	Add GLM-4 TextGeneration Model support for SGLang (#1736 )	2024-10-21 04:08:30 +00:00
Lianmin Zheng	b121bc03a3	Simplify batch result resolution (#1735 )	2024-10-20 19:47:14 -07:00
Lianmin Zheng	e12358dc91	Simplify the usage of device (#1734 )	2024-10-20 18:17:41 -07:00
yizhang2077	554fbf93cd	[Bugfix] qwen2vl forward_extend (#1727 )	2024-10-20 02:38:35 -07:00
Lianmin Zheng	b48edff67f	Split the overlapped version of TpModelWorkerClient into a separate file (#1726 )	2024-10-20 00:29:29 -07:00
Lianmin Zheng	59cbf47626	Unify the memory pool api and tp worker API (#1724 )	2024-10-19 23:19:26 -07:00
Yineng Zhang	cbbc82b7b8	Support qwen2 vl model (#1721 ) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: ispobock <ISPObaoke@163.com>	2024-10-19 21:44:38 -07:00
Yineng Zhang	8bee20f80b	Update vllm to 0.6.3 (#1711 ) (#1720 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2024-10-19 20:45:41 -07:00
Lianmin Zheng	12cad0feae	Simplify the interface of tp_worker (#1718 )	2024-10-19 17:39:38 -07:00
Lianmin Zheng	b6cd903604	Update readme and workflow (#1716 )	2024-10-19 13:01:44 -07:00
Lianmin Zheng	087257ea03	Release v0.3.4 (#1714 )	2024-10-19 08:17:41 -07:00
Lianmin Zheng	769bf11c05	Fix the race condition in overlap mode (#1712 )	2024-10-19 06:50:56 -07:00
Lianmin Zheng	3db43d1b08	Fix `is_all_ready` for overlap copy (#1710 )	2024-10-18 21:01:52 -07:00
Lianmin Zheng	f0f8a7699b	Simplify the nan detection and greedy check in sampler (#1709 )	2024-10-18 20:21:24 -07:00
Lianmin Zheng	2bcfba1b08	Skip unnecessary penalizer (#1707 )	2024-10-18 17:54:03 -07:00
Lianmin Zheng	bc12d4033f	Add grouped free operations (#1706 )	2024-10-18 13:21:05 -07:00
Lianmin Zheng	392f2863c8	Add dtype for more operations (#1705 )	2024-10-18 12:18:15 -07:00
Lianmin Zheng	6d0fa73ece	Simplify flashinfer utilities (#1704 )	2024-10-17 22:54:14 -07:00
Liangsheng Yin	9e0dac1ad7	Fix regex and logprob conflicts when chunked prefilling (#1703 )	2024-10-17 18:33:21 -07:00
Gleb Drozdov	a95d5589c3	Add matched_stop token or str to distinguish between eos or stop str finish_reason generation (#1684 )	2024-10-17 18:06:52 +00:00
Lianmin Zheng	d17d19e5b8	Fix mixed batch for multi modal models (#1702 )	2024-10-17 10:27:26 -07:00
Lianmin Zheng	dd3809fad8	Fix engine unit test (#1701 )	2024-10-17 09:53:32 -07:00
Lianmin Zheng	7feba41584	Fix failed ci tests on long prompts; Better error messages for embedding models (#1700 )	2024-10-17 09:23:29 -07:00
Michael Feil	e5db40dcbc	ORJson. Faster Json serialization (#1694 )	2024-10-17 08:03:08 -07:00
wxsm	b170930534	feat: radix tree code optimize (#1697 )	2024-10-17 08:01:27 -07:00
Jani Monoses	5ab20cceba	Use SGLang imports for linear layer (#1696 )	2024-10-17 07:50:01 -07:00
Lianmin Zheng	02f7f3e488	Update the transformers version in CI (#1690 )	2024-10-16 19:03:55 -07:00
Zeng Zhongchao	2782132be8	Add date to logging messages (#1623 ) (#1679 )	2024-10-16 18:54:55 -07:00
Michael Feil	b0facb3316	add orjson for jsonresponse (#1688 )	2024-10-16 18:14:30 -07:00
havetc	ecb8bad276	Returning a per request metric for number of cached_tokens read (#1599 )	2024-10-16 11:49:22 -07:00
Lianmin Zheng	dbec2f1847	Launch a thread to overlap CPU and GPU (#1687 )	2024-10-16 11:20:17 -07:00
Ke Bao	d10b933a36	Fix srt dependency (#1685 )	2024-10-16 08:21:20 -07:00
Lianmin Zheng	9116b2896f	Add a new event loop (#1677 )	2024-10-16 01:33:20 -07:00
Jani Monoses	a5114b6f91	Add OLMo model (#1676 )	2024-10-16 00:11:18 -07:00
Liangsheng Yin	b6b4094621	Fix filter_batch function call (#1681 )	2024-10-15 22:59:26 -07:00
Lianmin Zheng	f1088e0fc8	Fix memory leak during abort (#1674 )	2024-10-15 08:15:08 -07:00
Lianmin Zheng	175afed370	Improve benchmark scripts (#1672 )	2024-10-14 21:53:01 -07:00
Lianmin Zheng	4a292f670d	[Minor] Add some utility functions (#1671 )	2024-10-14 20:08:03 -07:00
Byron Hsu	56503d9bc9	[1/N] Remove `CacheConfig` import in all model files (#1658 )	2024-10-14 09:06:34 -07:00
Lianmin Zheng	02bc95796d	Simplify chunked prefill (#1667 )	2024-10-14 06:47:50 -07:00
Lianmin Zheng	24f3e1511c	[Minor] Improve style (#1666 )	2024-10-14 05:25:00 -07:00
Lianmin Zheng	6790240cc3	Fix unit test order to balance the tasks in CI (#1665 )	2024-10-14 02:01:44 -07:00
Shuo Yang	061e546313	Support double sparsity (#1459 )	2024-10-14 02:00:41 -07:00
Lianmin Zheng	0c1e87964b	Move filter_batch out of stream_output (#1663 )	2024-10-14 01:15:34 -07:00

1 2 3 4 5 ...

823 Commits