sglang

Author	SHA1	Message	Date
Ying Sheng	2fce449b1c	[API] add get memory pool size (#1760 ) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-10-23 07:02:29 +00:00
Byron Hsu	17536e7e3d	Fix edge case for truncated (#1747 )	2024-10-23 00:00:25 -04:00
Liangsheng Yin	5e1558f1f2	Update `max_req_len` and `max_req_input_len` (#1748 )	2024-10-21 16:12:04 -07:00
Liangsheng Yin	94cde10920	Llama3.2 vision model support (#1551 )	2024-10-21 15:01:21 -07:00
Liangsheng Yin	efb099cdee	Fix prefill oom (#1743 )	2024-10-21 03:54:35 -07:00
Lianmin Zheng	b121bc03a3	Simplify batch result resolution (#1735 )	2024-10-20 19:47:14 -07:00
Lianmin Zheng	e12358dc91	Simplify the usage of device (#1734 )	2024-10-20 18:17:41 -07:00
Lianmin Zheng	b48edff67f	Split the overlapped version of TpModelWorkerClient into a separate file (#1726 )	2024-10-20 00:29:29 -07:00
Lianmin Zheng	59cbf47626	Unify the memory pool api and tp worker API (#1724 )	2024-10-19 23:19:26 -07:00
Lianmin Zheng	12cad0feae	Simplify the interface of tp_worker (#1718 )	2024-10-19 17:39:38 -07:00
Lianmin Zheng	769bf11c05	Fix the race condition in overlap mode (#1712 )	2024-10-19 06:50:56 -07:00
Lianmin Zheng	2bcfba1b08	Skip unnecessary penalizer (#1707 )	2024-10-18 17:54:03 -07:00
Lianmin Zheng	bc12d4033f	Add grouped free operations (#1706 )	2024-10-18 13:21:05 -07:00
Liangsheng Yin	9e0dac1ad7	Fix regex and logprob conflicts when chunked prefilling (#1703 )	2024-10-17 18:33:21 -07:00
havetc	ecb8bad276	Returning a per request metric for number of cached_tokens read (#1599 )	2024-10-16 11:49:22 -07:00
Lianmin Zheng	dbec2f1847	Launch a thread to overlap CPU and GPU (#1687 )	2024-10-16 11:20:17 -07:00
Lianmin Zheng	9116b2896f	Add a new event loop (#1677 )	2024-10-16 01:33:20 -07:00
Lianmin Zheng	f1088e0fc8	Fix memory leak during abort (#1674 )	2024-10-15 08:15:08 -07:00
Lianmin Zheng	4a292f670d	[Minor] Add some utility functions (#1671 )	2024-10-14 20:08:03 -07:00
Lianmin Zheng	02bc95796d	Simplify chunked prefill (#1667 )	2024-10-14 06:47:50 -07:00
Lianmin Zheng	24f3e1511c	[Minor] Improve style (#1666 )	2024-10-14 05:25:00 -07:00
Lianmin Zheng	0c1e87964b	Move filter_batch out of stream_output (#1663 )	2024-10-14 01:15:34 -07:00
Lianmin Zheng	869f1c02c4	Add a test case to test retract (#1662 )	2024-10-13 20:32:37 -07:00
Ying Sheng	2725f8da61	[Minor] Rename no_eos_trim to no_stop_trim (#1661 )	2024-10-13 20:30:03 -07:00
Lianmin Zheng	da1ffed689	Add output_ids into ScheduleBatch (#1659 )	2024-10-13 19:54:02 -07:00
Ying Sheng	4876117171	[Fix] fix eos trim inconsistency (#1650 )	2024-10-13 01:07:09 -07:00
Lianmin Zheng	7ee6c259ff	Simplify the event loop and expose `--num-continuous-decode-steps` as an argument (#1652 )	2024-10-12 21:35:30 -07:00
Lianmin Zheng	9610fcd469	Fix the batch_is_full check for jump-forward decoding (#1654 )	2024-10-12 19:47:24 -07:00
Lianmin Zheng	9da5a60b18	Add an option to disable penalizer (#1651 )	2024-10-12 17:53:23 -07:00
Lianmin Zheng	69aa937aa5	Fix unit tests and type annotations (#1648 )	2024-10-12 14:49:24 -07:00
Lianmin Zheng	23cc66f7b6	Add back data parallelism (#1635 )	2024-10-11 07:22:48 -07:00
科英	bbd72bfc86	Add the ability to enable and disable the Profiler via HTTP API. (#1626 )	2024-10-11 02:34:25 -07:00
Byron Hsu	01fdb2f377	Fix test_vision_openai_server on CI (#1620 )	2024-10-10 16:34:13 -07:00
Ying Sheng	c5325aba75	[Profile] Add pytorch profiler (#1604 )	2024-10-07 14:37:16 -07:00
Lianmin Zheng	ebbc42d989	Optimize broadcast & Reorg code (#1598 )	2024-10-07 13:19:23 -07:00
Lianmin Zheng	b6aad70ab1	[Fix] Fix the case where prompt_len = 0 (#1593 )	2024-10-06 20:30:02 -07:00
Lianmin Zheng	58d1082e39	Clean up event loop (#1586 )	2024-10-06 03:24:04 -07:00
Lianmin Zheng	9244f27f0a	[Minor] Improve the style and fix flaky tests (#1584 )	2024-10-06 00:10:48 -07:00
Liangsheng Yin	5d0ba4038f	Refine the add request reasons to avoid corner cases. (#1574 )	2024-10-04 18:00:18 -07:00
Ying Sheng	04b262cd91	[Fix] Fix major performance bug in certain cases (#1563 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-10-04 08:51:11 +00:00
Lianmin Zheng	114bbc8651	Use ipc instead of tcp in zmq (#1566 )	2024-10-04 00:45:52 -07:00
Lianmin Zheng	32eb6e96f2	Organize sampling batch info better (#1562 )	2024-10-03 18:29:49 -07:00
Lianmin Zheng	63ba2f8d7b	Clean up batch data structures: Introducing ModelWorkerBatch (#1544 )	2024-09-30 06:41:49 -07:00
Lianmin Zheng	36d5acfca5	Rename InputMetadata -> ForwardBatch (#1543 )	2024-09-30 02:41:11 -07:00
Lianmin Zheng	3f0fe08d37	Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541 )	2024-09-29 20:28:45 -07:00
Lianmin Zheng	f86c1e611f	Move scheduler code from tp_worker.py to scheduler.py (#1538 )	2024-09-29 17:42:45 -07:00
Lianmin Zheng	048685430d	Improve process creation (#1534 )	2024-09-29 02:36:12 -07:00

47 Commits