sglang

Author	SHA1	Message	Date
Liangsheng Yin	94cde10920	Llama3.2 vision model support (#1551 )	2024-10-21 15:01:21 -07:00
Lianmin Zheng	00611286a1	Fix sliding window attention and gemma-2 unit tests in CI (#1746 )	2024-10-21 13:47:12 -07:00
Lianmin Zheng	cf470fea32	Make token mapping non-blocking in the overlapped mode (#1740 )	2024-10-20 23:25:14 -07:00
sixgod	45d5af2416	Add GLM-4 TextGeneration Model support for SGLang (#1736 )	2024-10-21 04:08:30 +00:00
yizhang2077	554fbf93cd	[Bugfix] qwen2vl forward_extend (#1727 )	2024-10-20 02:38:35 -07:00
Lianmin Zheng	b48edff67f	Split the overlapped version of TpModelWorkerClient into a separate file (#1726 )	2024-10-20 00:29:29 -07:00
Lianmin Zheng	593b19f29d	Temporarily skip this test_mixed_batch for QWen2VL (#1725 )	2024-10-20 00:05:45 -07:00
Yineng Zhang	cbbc82b7b8	Support qwen2 vl model (#1721 ) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: ispobock <ISPObaoke@163.com>	2024-10-19 21:44:38 -07:00
Yineng Zhang	8bee20f80b	Update vllm to 0.6.3 (#1711 ) (#1720 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2024-10-19 20:45:41 -07:00
Gleb Drozdov	a95d5589c3	Add matched_stop token or str to distinguish between eos or stop str finish_reason generation (#1684 )	2024-10-17 18:06:52 +00:00
Lianmin Zheng	d17d19e5b8	Fix mixed batch for multi modal models (#1702 )	2024-10-17 10:27:26 -07:00
Lianmin Zheng	dd3809fad8	Fix engine unit test (#1701 )	2024-10-17 09:53:32 -07:00
Lianmin Zheng	7feba41584	Fix failed ci tests on long prompts; Better error messages for embedding models (#1700 )	2024-10-17 09:23:29 -07:00
Lianmin Zheng	30ee36305e	Fix the failed unit tests (#1699 )	2024-10-17 08:13:29 -07:00
havetc	ecb8bad276	Returning a per request metric for number of cached_tokens read (#1599 )	2024-10-16 11:49:22 -07:00
Lianmin Zheng	9116b2896f	Add a new event loop (#1677 )	2024-10-16 01:33:20 -07:00
Jani Monoses	a5114b6f91	Add OLMo model (#1676 )	2024-10-16 00:11:18 -07:00
Shuo Yang	061e546313	Support double sparsity (#1459 )	2024-10-14 02:00:41 -07:00
Lianmin Zheng	0c1e87964b	Move filter_batch out of stream_output (#1663 )	2024-10-14 01:15:34 -07:00
Lianmin Zheng	869f1c02c4	Add a test case to test retract (#1662 )	2024-10-13 20:32:37 -07:00
Lianmin Zheng	dafb6a5266	[Fix] Fix the style of test_large_max_new_tokens.py (#1638 )	2024-10-11 16:05:58 -07:00
Byron Hsu	862cd265e5	[engine] support async and streaming (#1614 )	2024-10-11 15:26:25 -07:00
Lianmin Zheng	5d09ca5735	Fix constrained decoding (#1634 )	2024-10-11 06:26:20 -07:00
Lianmin Zheng	aba9eae4c6	Fix the correctness test in bench_latency.py when tp > 1 and test_generation_models.py (#1631 )	2024-10-11 05:03:20 -07:00
Byron Hsu	e8613df071	[Engine] Fix generate hanging issue after the first call (#1606 )	2024-10-08 04:26:56 +00:00
Ke Bao	68f8b60d22	Fix chunked prefill condition (#1594 )	2024-10-07 06:34:14 +00:00
Byron Hsu	551a3a9d38	Provide an offline engine API (#1567 )	2024-10-06 20:27:03 -07:00
Byron Hsu	17e998f1a8	Test consistency for single and batch seperately (#1590 )	2024-10-06 22:02:27 +00:00
Ying Sheng	c98e84c21e	[Minor, Performance] Use torch.argmax for greedy sampling (#1589 )	2024-10-06 13:15:05 -07:00
Lianmin Zheng	9244f27f0a	[Minor] Improve the style and fix flaky tests (#1584 )	2024-10-06 00:10:48 -07:00
Byron Hsu	2422de5193	Support min_tokens in sgl.gen (#1573 )	2024-10-05 21:51:12 -07:00
Ying Sheng	04b262cd91	[Fix] Fix major performance bug in certain cases (#1563 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-10-04 08:51:11 +00:00
Lianmin Zheng	32eb6e96f2	Organize sampling batch info better (#1562 )	2024-10-03 18:29:49 -07:00
Minsang Song	e6852b0dd2	[Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim' (#1536 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-10-02 20:41:15 -07:00
Theresa Barton	2c7d0a5b8b	[Fix] Fix all the Huggingface paths (#1553 )	2024-10-02 10:12:07 -07:00
Liangsheng Yin	99ec439da4	Organize Attention Backends (#1547 )	2024-09-30 15:54:18 -07:00
Ying Sheng	0f4fb19bc8	[Fix, LoRA] fix LoRA with updates in main (#1545 )	2024-09-30 10:06:08 -07:00
Lianmin Zheng	3f0fe08d37	Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541 )	2024-09-29 20:28:45 -07:00
Lianmin Zheng	048685430d	Improve process creation (#1534 )	2024-09-29 02:36:12 -07:00
Ying Sheng	9aa6553d2a	[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525 )	2024-09-27 23:32:11 -07:00
TianyiQ	3c93187caf	Add support for tie_word_embeddings when loading weights + support for SmolLM (#1508 )	2024-09-24 21:50:20 -07:00
Lianmin Zheng	fb2d0680e0	[Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510 )	2024-09-24 21:37:33 -07:00
Lianmin Zheng	28b4d8e144	Update test_srt_backend.py (#1502 )	2024-09-24 03:17:10 -07:00
Yineng Zhang	42a2d82ba7	minor: add mla fp8 test (#1494 )	2024-09-23 20:40:17 +08:00
Ying Sheng	e4780cf839	[API, Feature] Support response prefill for openai API (#1490 )	2024-09-22 06:46:17 -07:00
Lianmin Zheng	13f1357ef0	Add a unit test for data parallelism (#1489 )	2024-09-22 02:21:05 -07:00
Lianmin Zheng	167591e864	Better unit tests for adding a new model (#1488 )	2024-09-22 01:50:37 -07:00
Ke Bao	b8ccaf4d73	Add MLA gsm8k eval (#1484 )	2024-09-21 11:16:13 +08:00
Ke Bao	a68cb201dd	Fix triton head num (#1482 )	2024-09-21 10:25:20 +08:00
Yineng Zhang	a6db88626e	minor: add quant eval compared with base (#1475 )	2024-09-20 01:57:19 +08:00

1 2 3 4

196 Commits