sglang

Author	SHA1	Message	Date
Liangsheng Yin	62757db6f0	Reduce the overhead when cache is disabled (#1010 )	2024-08-09 16:36:57 -07:00
gryffindor-rr	9cf0a5bada	Add skip_tokenizer_init args. (#959 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-09 12:14:13 -07:00
Juwan Yoo	10bca45bc6	bugfix: penalizers to be merged before reqs (#1001 )	2024-08-09 21:46:24 +10:00
Ying Sheng	e040a2450b	Add e5-mistral embedding model - step 3/3 (#988 )	2024-08-08 16:31:19 -07:00
Juwan Yoo	ab7875941b	feat: frequency, min_new_tokens, presence, and repetition penalties (#973 )	2024-08-08 04:21:08 -07:00
yichuan~	3a79613c28	support more optioin about usage in stream mode (#985 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-08 09:41:57 +00:00
Liangsheng Yin	1ac304eeb4	Adjust `InputeMetadata` and `ScheduleBatch` (#981 )	2024-08-08 01:11:22 -07:00
Liangsheng Yin	2b8257f325	Adjust max prefix len (#980 )	2024-08-08 00:41:26 +00:00
Liangsheng Yin	7623091d97	RadixCache method adjust (#977 )	2024-08-07 15:52:24 -07:00
Zhiqiang Xie	6db27f7b3b	misc: correct the int data type for token ids and indices (#969 )	2024-08-08 04:40:07 +08:00
Liangsheng Yin	7fa54a1ab3	Make `req_pool_indices` on CPU (#960 )	2024-08-07 01:41:25 -07:00
Liangsheng Yin	87e8c090e9	Organize code (rename, movement) (#953 )	2024-08-06 20:50:32 -07:00
Ke Bao	e1eae1fd15	Support MLA for DeepSeek-V2 with Triton - step 1 (#905 )	2024-08-05 03:40:33 +10:00
min-xu-et	7dd8a7e6d9	fixed an error handling in bench_latency.py (#904 )	2024-08-03 17:42:17 -07:00
Liangsheng Yin	c020f9ceda	Support chunked prefill when radix cache is disabled (#811 )	2024-08-01 00:29:01 -07:00
Ying Sheng	e7487b08bc	Adjust default mem fraction to avoid OOM (#823 )	2024-07-30 01:58:31 -07:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00

17 Commits