sglang

Author	SHA1	Message	Date
Liangsheng Yin	73cf6834f2	Support `stop_token_ids` in sglang API (#1092 )	2024-08-15 00:31:39 +00:00
Ying Sheng	96a2093ef0	[Fix] Compatibility of window attention and cuda graph (#1090 )	2024-08-14 10:37:01 -07:00
Liangsheng Yin	a34dd86a7d	Use `dtype` to control generate (#1082 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-08-14 15:58:07 +00:00
Lianmin Zheng	a59636bb5e	Update grok 1 model (#1095 )	2024-08-14 04:40:44 -07:00
Lianmin Zheng	8f790ac100	Fix a bug in cuda graph runner (#1094 )	2024-08-14 03:25:38 -07:00
rainred	616b59f384	[Feature] modify Runtime to support skip_tokenizer_init (#1088 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-14 00:28:04 -07:00
Liangsheng Yin	e205527cb1	Fix jump forward final state circular path bug. (#1084 )	2024-08-13 21:14:05 -07:00
Ying Sheng	0909bb0d2f	[Feat] Add window attention for gemma-2 (#1056 )	2024-08-13 17:01:26 -07:00
Lianmin Zheng	ad3e4f1619	Update the mixtral to use the better FusedMoE layer (#1081 )	2024-08-13 15:44:25 -07:00
rainred	95f5fbf1a7	Fix create_abort_task, GenerateReqInput does not have rids. (#1079 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-13 12:47:22 +00:00
Yineng Zhang	65915f9f3e	fix: temporary solution for DeepSeek V2 H100 layout conversion issue (#1060 ) Co-authored-by: ispobock <ISPObaoke@163.com>	2024-08-13 15:48:54 +10:00
Ke Bao	162f3ccb01	Fix layernorm input shape (#1066 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-13 15:48:07 +10:00
Yineng Zhang	6a38efa834	feat: replace all rmsnorm and silu (#1057 )	2024-08-13 02:15:59 +10:00
Lianmin Zheng	c877292cc1	Re-organize CI tests (#1052 )	2024-08-12 03:39:01 -07:00
Lianmin Zheng	41598e0d8e	Add longer accuracy test on CI (#1049 )	2024-08-12 09:21:38 +00:00
Ying Sheng	32f6144323	fix: Fix returned prefill logits and add output str test (#1046 )	2024-08-12 06:13:45 +00:00
Lianmin Zheng	fb1f28cbbb	Clean up the comments and names under python/sglang/srt/layers (#1047 )	2024-08-12 05:54:37 +00:00
Liangsheng Yin	fb7421db0d	minor: some potential bugs (#1044 )	2024-08-12 05:35:44 +00:00
Liangsheng Yin	7de6034534	Fix the prefix indices (#1037 )	2024-08-11 17:57:02 -07:00
Lianmin Zheng	d84c5e70f7	Test the case when max_new_tokens is very large (#1038 )	2024-08-11 16:41:03 -07:00
Lianmin Zheng	d785412077	Fix the case when max_new_tokens is too large (#1025 )	2024-08-11 15:20:18 -07:00
Liangsheng Yin	7b6a5332ca	Fix triton args init (#1034 )	2024-08-11 12:11:26 -07:00
Lianmin Zheng	4080e82244	Fix the case where r.prefix_indices is None (#1031 )	2024-08-11 04:53:51 -07:00
Yineng Zhang	c245b78973	hotfix: add CustomOp abstraction (#1027 )	2024-08-11 02:45:59 -07:00
Lianmin Zheng	9dae407812	Improve type annotation (#1029 )	2024-08-11 02:44:59 -07:00
Liangsheng Yin	fcc0f5ed99	Fix wrong assert (#1028 )	2024-08-11 09:22:16 +00:00
Lianmin Zheng	a97df79124	Clean up readme and arguments of chunked prefill (#1022 )	2024-08-11 01:18:52 -07:00
Yineng Zhang	94752ac811	feat: use FlashInfer rmsnorm and silu (#907 )	2024-08-11 14:57:13 +10:00
Liangsheng Yin	43fbb6d919	Fix `input_ids` && rename to `fill_ids` (#1021 )	2024-08-10 16:24:12 -07:00
Lianmin Zheng	54fb1c80c0	Clean up unit tests (#1020 )	2024-08-10 15:09:03 -07:00
Ying Sheng	b68c4c073b	fix: force max new tokens to be 1 for embedding request (#1019 )	2024-08-10 13:46:42 -07:00
Ying Sheng	7599badeaf	Support embedding input as a list (#1014 )	2024-08-10 08:39:05 -07:00
Liangsheng Yin	62757db6f0	Reduce the overhead when cache is disabled (#1010 )	2024-08-09 16:36:57 -07:00
Liangsheng Yin	73fa2d49d5	Some warnings to crash when CI (#1009 )	2024-08-09 15:16:23 -07:00
gryffindor-rr	9cf0a5bada	Add skip_tokenizer_init args. (#959 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-09 12:14:13 -07:00
Ying Sheng	b16e856f11	Add openai embedding API (#997 )	2024-08-09 11:19:18 -07:00
Juwan Yoo	10bca45bc6	bugfix: penalizers to be merged before reqs (#1001 )	2024-08-09 21:46:24 +10:00
liuyhwangyh	b91a4cb1b1	support models from www.modelscope.cn (#994 ) Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>	2024-08-09 02:52:14 -07:00
Ying Sheng	e040a2450b	Add e5-mistral embedding model - step 3/3 (#988 )	2024-08-08 16:31:19 -07:00
Ying Sheng	9f662501a3	Move torch.compile configs into cuda_graph_runner.py (#993 )	2024-08-08 13:20:30 -07:00
Juwan Yoo	ab7875941b	feat: frequency, min_new_tokens, presence, and repetition penalties (#973 )	2024-08-08 04:21:08 -07:00
yichuan~	3a79613c28	support more optioin about usage in stream mode (#985 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-08 09:41:57 +00:00
Liangsheng Yin	1ac304eeb4	Adjust `InputeMetadata` and `ScheduleBatch` (#981 )	2024-08-08 01:11:22 -07:00
Ying Sheng	20a4f927dc	Add io struct for embedding models [unreachable code] - step 2/3 (#987 )	2024-08-08 07:52:31 +00:00
Ying Sheng	0de7c2d09e	Add e5-mistral modules [unreachable code] - step 1/3 (#983 )	2024-08-08 00:04:15 -07:00
Liangsheng Yin	6ed4e3b8fb	Fix chunked prefill (#984 )	2024-08-07 22:28:42 -07:00
Ying Sheng	00023d622a	[minor] Update type annotation in tokenizer_manager.py (#982 )	2024-08-08 01:48:45 +00:00
foszto	c62d560c03	#590 Increase default , track changes in examples and documentation (#971 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-08 00:54:46 +00:00
Liangsheng Yin	2b8257f325	Adjust max prefix len (#980 )	2024-08-08 00:41:26 +00:00
Liangsheng Yin	7623091d97	RadixCache method adjust (#977 )	2024-08-07 15:52:24 -07:00

1 2 3 4 5 ...

384 Commits