sglang

Author	SHA1	Message	Date
intervitens	068e9eae55	Support min-p sampling (#1167 )	2024-08-21 22:49:32 +00:00
rainred	d6aeb9fa15	[Feature] Add a function to convert sampling_params to kwargs (#1170 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-21 14:28:35 -07:00
Yineng Zhang	1fb9459908	fix: custom op fallback forward native when lower sm80 (#1177 )	2024-08-21 14:26:35 -07:00
Lianmin Zheng	bea2bb9eea	Improve multi-node stability (#1171 )	2024-08-20 22:35:05 -07:00
Shan Yu	cd10654e7e	[Feat] Support update weights without restart server (#1157 )	2024-08-20 13:48:24 -07:00
Lucien	6242c399ab	Generate 1 token to verify the health of the inference service in /health (#1154 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-21 03:14:34 +10:00
Yineng Zhang	04707b09b7	misc: add hypervisor vendor (#1165 )	2024-08-21 02:14:51 +10:00
Xu-Chen	ff2cfdb1a2	[Feature] add disable-custom-all-reduce (#1148 ) Co-authored-by: chenxu02 <chenxu02@zhihu.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-20 08:44:12 -07:00
Lianmin Zheng	a8ae640328	Improve docs and warnings (#1164 )	2024-08-20 08:31:29 -07:00
Juwan Yoo	d8476818ef	feat: allow streaming for multi-prompt and/or parallel sampling (#1134 )	2024-08-20 08:06:55 -07:00
Ke Bao	df191254ab	Optimize MLA/GQA/MQA Triton decoding (#1138 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-19 20:23:07 +10:00
yichuan~	b997a18d74	[Feat]Add support for optional start len of logprobs (#1035 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2024-08-18 23:45:41 -07:00
min-xu-et	fa13b95d6b	fixed a typo (#1143 )	2024-08-18 14:29:09 -07:00
Lianmin Zheng	3c1f5a9220	Fix duplicated imports in hf_transformers_utils.py (#1141 )	2024-08-17 18:03:00 -07:00
Lianmin Zheng	57d0bd91ec	Improve benchmark (#1140 )	2024-08-17 17:43:23 -07:00
Lianmin Zheng	cdc8d60752	Improve the code style: more comments and remove useless packages (#1139 )	2024-08-17 14:37:52 -07:00
Yineng Zhang	9208591f05	fix: use fp16 dtype for sm75 (#1136 )	2024-08-18 00:45:42 +10:00
Liangsheng Yin	f624f6a6cc	Fix port conflicts between local CI and runner CI. (#1131 )	2024-08-16 15:12:38 -07:00
Liangsheng Yin	3694f8f996	Mixed style of chunked prefill (#1013 )	2024-08-16 09:13:00 +00:00
Lianmin Zheng	5a261bd055	Fix the deadlock in multi-node tp (#1122 )	2024-08-16 01:39:24 -07:00
Yineng Zhang	5bd953749b	chore: bump v0.2.13 (#1111 )	2024-08-16 03:50:43 +10:00
Lianmin Zheng	0cb099e20a	set CUDA_DEVICE_MAX_CONNECTIONS=1 (#1113 )	2024-08-16 03:47:39 +10:00
Ying Sheng	93d4e354d8	[Fix] Window attention compatible with RadixAttention and chunked prefill (#1112 )	2024-08-15 10:33:20 -07:00
Yineng Zhang	9195d1362a	misc: rm unused model_loader (#1110 )	2024-08-15 08:29:35 -07:00
Ying Sheng	14cb544d56	[Fix] fix flashinfer usage for window attention (#1107 )	2024-08-15 00:53:24 -07:00
Lianmin Zheng	e86b1ccbf0	Enable chunked prefill by default (#1040 )	2024-08-14 21:56:20 -07:00
Ying Sheng	8d2d876fc8	[Fix] fix the typo bug for window attention (#1106 )	2024-08-14 21:56:01 -07:00
Lianmin Zheng	326df4bab2	Use a single workspace for flashinfer (#1077 )	2024-08-14 19:25:37 -07:00
Ying Sheng	6767e2229f	Support jinja as chat template file (#1104 )	2024-08-14 17:43:14 -07:00
Liangsheng Yin	73cf6834f2	Support `stop_token_ids` in sglang API (#1092 )	2024-08-15 00:31:39 +00:00
Ying Sheng	96a2093ef0	[Fix] Compatibility of window attention and cuda graph (#1090 )	2024-08-14 10:37:01 -07:00
Liangsheng Yin	a34dd86a7d	Use `dtype` to control generate (#1082 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-08-14 15:58:07 +00:00
Lianmin Zheng	a59636bb5e	Update grok 1 model (#1095 )	2024-08-14 04:40:44 -07:00
Lianmin Zheng	8f790ac100	Fix a bug in cuda graph runner (#1094 )	2024-08-14 03:25:38 -07:00
rainred	616b59f384	[Feature] modify Runtime to support skip_tokenizer_init (#1088 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-14 00:28:04 -07:00
Liangsheng Yin	e205527cb1	Fix jump forward final state circular path bug. (#1084 )	2024-08-13 21:14:05 -07:00
Ying Sheng	0909bb0d2f	[Feat] Add window attention for gemma-2 (#1056 )	2024-08-13 17:01:26 -07:00
Lianmin Zheng	ad3e4f1619	Update the mixtral to use the better FusedMoE layer (#1081 )	2024-08-13 15:44:25 -07:00
rainred	95f5fbf1a7	Fix create_abort_task, GenerateReqInput does not have rids. (#1079 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-13 12:47:22 +00:00
Yineng Zhang	f7fb68d292	ci: add moe test (#1053 )	2024-08-13 18:43:23 +10:00
Yineng Zhang	65915f9f3e	fix: temporary solution for DeepSeek V2 H100 layout conversion issue (#1060 ) Co-authored-by: ispobock <ISPObaoke@163.com>	2024-08-13 15:48:54 +10:00
Ke Bao	162f3ccb01	Fix layernorm input shape (#1066 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-13 15:48:07 +10:00
Yineng Zhang	65e89baea9	fix: not use the default port (#1068 )	2024-08-13 15:12:56 +10:00
Yineng Zhang	6a38efa834	feat: replace all rmsnorm and silu (#1057 )	2024-08-13 02:15:59 +10:00
Yineng Zhang	b0ad0c1bc8	chore: bump v0.2.12 (#1048 )	2024-08-12 20:59:38 +10:00
Lianmin Zheng	c877292cc1	Re-organize CI tests (#1052 )	2024-08-12 03:39:01 -07:00
Lianmin Zheng	0c1c72a0b4	Fix accuracy test (#1051 )	2024-08-12 19:48:40 +10:00
Lianmin Zheng	41598e0d8e	Add longer accuracy test on CI (#1049 )	2024-08-12 09:21:38 +00:00
Ying Sheng	32f6144323	fix: Fix returned prefill logits and add output str test (#1046 )	2024-08-12 06:13:45 +00:00
Lianmin Zheng	fb1f28cbbb	Clean up the comments and names under python/sglang/srt/layers (#1047 )	2024-08-12 05:54:37 +00:00

1 2 3 4 5 ...

547 Commits