sglang

Author	SHA1	Message	Date
Lianmin Zheng	8496701934	[Misc] Fix metrics, weight update lock, request logging (#2543 )	2024-12-22 06:27:22 -08:00
SangBin Cho	9208618b3e	[Core] in batch prefix caching by delay scheduling (#2442 )	2024-12-11 12:51:50 -08:00
Qun Yang	37ee906f61	Add more support for intel Gaudi accelerators (#2357 )	2024-12-06 01:16:33 -08:00
Lianmin Zheng	b548801ddb	Update docs (#1839 )	2024-10-30 02:49:08 -07:00
Lianmin Zheng	fc82f5a743	[Fix] Fix cuda graph padding for triton attention backend (#1782 )	2024-10-24 12:33:15 -07:00
Lianmin Zheng	fbcbb26327	Fix perf regression for set_kv_buffer (#1765 )	2024-10-23 09:57:08 -07:00
Lianmin Zheng	ad4125d1a9	Fuse more ops & Simplify token mapping (#1758 )	2024-10-22 23:20:43 -07:00
Liangsheng Yin	94cde10920	Llama3.2 vision model support (#1551 )	2024-10-21 15:01:21 -07:00
Lianmin Zheng	b48edff67f	Split the overlapped version of TpModelWorkerClient into a separate file (#1726 )	2024-10-20 00:29:29 -07:00
Lianmin Zheng	59cbf47626	Unify the memory pool api and tp worker API (#1724 )	2024-10-19 23:19:26 -07:00
Lianmin Zheng	769bf11c05	Fix the race condition in overlap mode (#1712 )	2024-10-19 06:50:56 -07:00
Lianmin Zheng	2bcfba1b08	Skip unnecessary penalizer (#1707 )	2024-10-18 17:54:03 -07:00
Lianmin Zheng	bc12d4033f	Add grouped free operations (#1706 )	2024-10-18 13:21:05 -07:00
wxsm	b170930534	feat: radix tree code optimize (#1697 )	2024-10-17 08:01:27 -07:00
Lianmin Zheng	9116b2896f	Add a new event loop (#1677 )	2024-10-16 01:33:20 -07:00
Shuo Yang	061e546313	Support double sparsity (#1459 )	2024-10-14 02:00:41 -07:00
Lianmin Zheng	9244f27f0a	[Minor] Improve the style and fix flaky tests (#1584 )	2024-10-06 00:10:48 -07:00
Lianmin Zheng	45473d4b2b	Make input_ids a torch.Tensor (#1568 )	2024-10-04 01:09:59 -07:00
Lianmin Zheng	114bbc8651	Use ipc instead of tcp in zmq (#1566 )	2024-10-04 00:45:52 -07:00
Lianmin Zheng	32eb6e96f2	Organize sampling batch info better (#1562 )	2024-10-03 18:29:49 -07:00
Lianmin Zheng	4ae0969c0a	Move status check in the memory pool to CPU (#1557 )	2024-10-02 18:23:35 -07:00
Lianmin Zheng	f86c1e611f	Move scheduler code from tp_worker.py to scheduler.py (#1538 )	2024-09-29 17:42:45 -07:00
luzengxiangcn	e6692bf4a5	debug radixcache stack_overflow (#1499 )	2024-09-24 04:58:01 -07:00
Ke Bao	2c615d120f	[Feature] Support fp8 e5m2 kv cache with flashinfer (#1204 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-25 17:38:11 -07:00
Lianmin Zheng	c877292cc1	Re-organize CI tests (#1052 )	2024-08-12 03:39:01 -07:00
Liangsheng Yin	fb7421db0d	minor: some potential bugs (#1044 )	2024-08-12 05:35:44 +00:00
Liangsheng Yin	7de6034534	Fix the prefix indices (#1037 )	2024-08-11 17:57:02 -07:00
Lianmin Zheng	9dae407812	Improve type annotation (#1029 )	2024-08-11 02:44:59 -07:00
Liangsheng Yin	fcc0f5ed99	Fix wrong assert (#1028 )	2024-08-11 09:22:16 +00:00
Liangsheng Yin	43fbb6d919	Fix `input_ids` && rename to `fill_ids` (#1021 )	2024-08-10 16:24:12 -07:00
Liangsheng Yin	62757db6f0	Reduce the overhead when cache is disabled (#1010 )	2024-08-09 16:36:57 -07:00
Liangsheng Yin	6ed4e3b8fb	Fix chunked prefill (#984 )	2024-08-07 22:28:42 -07:00
Liangsheng Yin	7623091d97	RadixCache method adjust (#977 )	2024-08-07 15:52:24 -07:00
Zhiqiang Xie	6db27f7b3b	misc: correct the int data type for token ids and indices (#969 )	2024-08-08 04:40:07 +08:00
Liangsheng Yin	a01ddd9605	misc: fix the req_to_token member change (#967 )	2024-08-07 01:52:10 -07:00
Liangsheng Yin	7fa54a1ab3	Make `req_pool_indices` on CPU (#960 )	2024-08-07 01:41:25 -07:00
Ke Bao	e1eae1fd15	Support MLA for DeepSeek-V2 with Triton - step 1 (#905 )	2024-08-05 03:40:33 +10:00
Liangsheng Yin	c020f9ceda	Support chunked prefill when radix cache is disabled (#811 )	2024-08-01 00:29:01 -07:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00

39 Commits