sglang

Author	SHA1	Message	Date
Chayenne	c77c1e05ba	fix black in pre-commit (#1940 )	2024-11-08 07:42:47 +08:00
Xuehai Pan	a5e0defb5a	minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926 )	2024-11-06 13:46:04 +00:00
Chayenne	704f8e8ed1	Add Reward API Docs etc (#1910 ) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>	2024-11-03 22:33:03 -08:00
Lianmin Zheng	2ce32db6fb	Let reward model take text inputs instead of message lists (#1907 ) Co-authored-by: Kyle Corbitt <kyle@corbt.com>	2024-11-03 13:27:12 -08:00
Lianmin Zheng	0abbf289a8	Unify the model type checking (#1905 )	2024-11-03 12:25:39 -08:00
Lianmin Zheng	c17c578108	Simplify tokenizer manager (#1904 )	2024-11-03 08:38:26 -08:00
Lianmin Zheng	d1b31b0684	Improve docs and fix the broken links (#1875 )	2024-11-01 17:47:44 -07:00
Yineng Zhang	104bf2609b	minor: update nightly eval (#1867 )	2024-11-01 21:38:29 +08:00
Yineng Zhang	d86a2d6562	minor: add human eval (#1754 )	2024-11-01 14:29:20 +08:00
Liangsheng Yin	b9fd178f1b	Fix retraction + overlap (#1860 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-10-31 18:27:42 -07:00
Lianmin Zheng	a2e0424abf	Fix memory leak for chunked prefill 2 (#1858 ) Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2024-10-31 14:51:51 -07:00
Lianmin Zheng	f7102fbd2b	Fix mixed chunked prefill (#1850 )	2024-10-30 21:20:41 -07:00
DanielC12321	5e00ddebc0	Add new model: Gpt2 (#1833 )	2024-10-29 17:52:33 -07:00
Byron Hsu	680cad2023	fix get_memory_pool_size deadlock for DP (#1830 )	2024-10-28 23:07:14 -07:00
Byron Hsu	0a24eb850a	Fix update_weights deadlock for DP (#1825 )	2024-10-28 12:02:23 -07:00
Byron Hsu	6fcd6d7d6d	Support token ids in `engine.generate` (#1820 )	2024-10-27 14:02:34 -07:00
Ke Bao	c77762d57f	Fix Triton decode kernel & ut (#1819 )	2024-10-27 10:54:38 -07:00
Lianmin Zheng	86fc0d79d0	Add a watch dog thread (#1816 )	2024-10-27 02:00:50 -07:00
Lianmin Zheng	2b80978859	Provide an argument to set the maximum batch size for cuda graph (#1809 )	2024-10-26 15:09:33 -07:00
Lianmin Zheng	6aa94b967c	Update ci workflows (#1804 )	2024-10-26 04:32:36 -07:00
Lianmin Zheng	fb99aaa527	[Fix] Fix --skip-tokenizer-init (#1798 )	2024-10-25 18:51:59 -07:00
Lianmin Zheng	e646c5901e	Fix logprob in the overlapped mode (#1795 )	2024-10-25 11:06:57 -07:00
Lianmin Zheng	c555ce2ca2	Revert "Fix memory leak when doing chunked prefill" (#1797 )	2024-10-25 10:24:44 -07:00
Lianmin Zheng	40900baea7	[Fix] Fix the log parsing in chunked prefill uni tests (#1794 )	2024-10-25 08:31:08 -07:00
Liangsheng Yin	a2f5e7555f	Fix memory leak when doing chunked prefill (#1787 )	2024-10-25 08:01:17 -07:00
Lianmin Zheng	1701b0db31	Enhance the test case for chunked prefill (#1785 )	2024-10-24 21:23:09 -07:00
Lianmin Zheng	05b3bf5e8e	Crash the server on warnings in CI (#1772 )	2024-10-23 16:27:13 -07:00
Ying Sheng	2fce449b1c	[API] add get memory pool size (#1760 ) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-10-23 07:02:29 +00:00
Lianmin Zheng	ad4125d1a9	Fuse more ops & Simplify token mapping (#1758 )	2024-10-22 23:20:43 -07:00
Liangsheng Yin	94cde10920	Llama3.2 vision model support (#1551 )	2024-10-21 15:01:21 -07:00
Lianmin Zheng	00611286a1	Fix sliding window attention and gemma-2 unit tests in CI (#1746 )	2024-10-21 13:47:12 -07:00
Lianmin Zheng	cf470fea32	Make token mapping non-blocking in the overlapped mode (#1740 )	2024-10-20 23:25:14 -07:00
sixgod	45d5af2416	Add GLM-4 TextGeneration Model support for SGLang (#1736 )	2024-10-21 04:08:30 +00:00
yizhang2077	554fbf93cd	[Bugfix] qwen2vl forward_extend (#1727 )	2024-10-20 02:38:35 -07:00
Lianmin Zheng	b48edff67f	Split the overlapped version of TpModelWorkerClient into a separate file (#1726 )	2024-10-20 00:29:29 -07:00
Lianmin Zheng	593b19f29d	Temporarily skip this test_mixed_batch for QWen2VL (#1725 )	2024-10-20 00:05:45 -07:00
Yineng Zhang	cbbc82b7b8	Support qwen2 vl model (#1721 ) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: ispobock <ISPObaoke@163.com>	2024-10-19 21:44:38 -07:00
Yineng Zhang	8bee20f80b	Update vllm to 0.6.3 (#1711 ) (#1720 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2024-10-19 20:45:41 -07:00
Gleb Drozdov	a95d5589c3	Add matched_stop token or str to distinguish between eos or stop str finish_reason generation (#1684 )	2024-10-17 18:06:52 +00:00
Lianmin Zheng	d17d19e5b8	Fix mixed batch for multi modal models (#1702 )	2024-10-17 10:27:26 -07:00
Lianmin Zheng	dd3809fad8	Fix engine unit test (#1701 )	2024-10-17 09:53:32 -07:00
Lianmin Zheng	7feba41584	Fix failed ci tests on long prompts; Better error messages for embedding models (#1700 )	2024-10-17 09:23:29 -07:00
Lianmin Zheng	30ee36305e	Fix the failed unit tests (#1699 )	2024-10-17 08:13:29 -07:00
havetc	ecb8bad276	Returning a per request metric for number of cached_tokens read (#1599 )	2024-10-16 11:49:22 -07:00
Lianmin Zheng	9116b2896f	Add a new event loop (#1677 )	2024-10-16 01:33:20 -07:00
Jani Monoses	a5114b6f91	Add OLMo model (#1676 )	2024-10-16 00:11:18 -07:00
Shuo Yang	061e546313	Support double sparsity (#1459 )	2024-10-14 02:00:41 -07:00
Lianmin Zheng	0c1e87964b	Move filter_batch out of stream_output (#1663 )	2024-10-14 01:15:34 -07:00
Lianmin Zheng	869f1c02c4	Add a test case to test retract (#1662 )	2024-10-13 20:32:37 -07:00
Lianmin Zheng	dafb6a5266	[Fix] Fix the style of test_large_max_new_tokens.py (#1638 )	2024-10-11 16:05:58 -07:00

1 2 3 4 5

225 Commits