sglang

Author	SHA1	Message	Date
Lianmin Zheng	fbcbb26327	Fix perf regression for set_kv_buffer (#1765 )	2024-10-23 09:57:08 -07:00
Lianmin Zheng	ad4125d1a9	Fuse more ops & Simplify token mapping (#1758 )	2024-10-22 23:20:43 -07:00
Liangsheng Yin	94cde10920	Llama3.2 vision model support (#1551 )	2024-10-21 15:01:21 -07:00
Lianmin Zheng	b48edff67f	Split the overlapped version of TpModelWorkerClient into a separate file (#1726 )	2024-10-20 00:29:29 -07:00
Lianmin Zheng	59cbf47626	Unify the memory pool api and tp worker API (#1724 )	2024-10-19 23:19:26 -07:00
Lianmin Zheng	2bcfba1b08	Skip unnecessary penalizer (#1707 )	2024-10-18 17:54:03 -07:00
Lianmin Zheng	bc12d4033f	Add grouped free operations (#1706 )	2024-10-18 13:21:05 -07:00
Shuo Yang	061e546313	Support double sparsity (#1459 )	2024-10-14 02:00:41 -07:00
Lianmin Zheng	9244f27f0a	[Minor] Improve the style and fix flaky tests (#1584 )	2024-10-06 00:10:48 -07:00
Lianmin Zheng	45473d4b2b	Make input_ids a torch.Tensor (#1568 )	2024-10-04 01:09:59 -07:00
Lianmin Zheng	114bbc8651	Use ipc instead of tcp in zmq (#1566 )	2024-10-04 00:45:52 -07:00
Lianmin Zheng	32eb6e96f2	Organize sampling batch info better (#1562 )	2024-10-03 18:29:49 -07:00
Lianmin Zheng	4ae0969c0a	Move status check in the memory pool to CPU (#1557 )	2024-10-02 18:23:35 -07:00
Lianmin Zheng	f86c1e611f	Move scheduler code from tp_worker.py to scheduler.py (#1538 )	2024-09-29 17:42:45 -07:00
Ke Bao	2c615d120f	[Feature] Support fp8 e5m2 kv cache with flashinfer (#1204 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-25 17:38:11 -07:00
Lianmin Zheng	9dae407812	Improve type annotation (#1029 )	2024-08-11 02:44:59 -07:00
Liangsheng Yin	a01ddd9605	misc: fix the req_to_token member change (#967 )	2024-08-07 01:52:10 -07:00
Liangsheng Yin	7fa54a1ab3	Make `req_pool_indices` on CPU (#960 )	2024-08-07 01:41:25 -07:00
Ke Bao	e1eae1fd15	Support MLA for DeepSeek-V2 with Triton - step 1 (#905 )	2024-08-05 03:40:33 +10:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00

20 Commits