sglang

Author	SHA1	Message	Date
zifeitong	93dffd699b	Add constrained_json_whitespace_pattern to ServerArgs (#1438 )	2024-09-16 13:29:18 -07:00
Ying Sheng	2abe4f1cb6	Revert "[Minor] Raise exception for wrong import (#1409 )" (#1432 )	2024-09-15 15:22:32 -07:00
Ying Sheng	37963394aa	[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433 )	2024-09-15 12:46:04 -07:00
Lianmin Zheng	899cf5c438	Remove deprecated configs (#1431 )	2024-09-15 08:52:18 -07:00
Lianmin Zheng	e79f6cd73d	Release v0.3.1 (#1430 )	2024-09-15 23:03:16 +09:00
Lianmin Zheng	9ba1f09760	[Fix] Fix logprob and normalized_logprob (#1428 )	2024-09-15 06:36:06 -07:00
Lianmin Zheng	282681b8a1	Update backend.md (#1429 )	2024-09-15 02:55:34 -07:00
William Arnold	58cafe23a7	Add libibverbs-dev to Dockerfile (#1427 )	2024-09-15 15:40:31 +09:00
Lianmin Zheng	9463bc1385	Enable torch.compile for triton backend (#1422 )	2024-09-14 15:38:37 -07:00
Yineng Zhang	e3fc4658f4	fix: resolve nightly eval (#1426 )	2024-09-15 02:07:52 +10:00
Ke Bao	33b54e7c40	Add pytorch sampling backend ut (#1425 )	2024-09-15 01:15:30 +10:00
Jerry Zhang	30b404ce72	Add torchao quant for mixtral and qwen_moe (#1418 )	2024-09-14 06:46:55 +00:00
Liangsheng Yin	70b6802982	Optimize conflicts between CUDA graph and vocab mask tensors (#1392 )	2024-09-13 20:27:53 -07:00
Yineng Zhang	f3d32f888a	ci: fix finish (#1414 )	2024-09-14 01:01:30 +10:00
Lianmin Zheng	8779da95d6	Update pr-test.yml (#1412 )	2024-09-13 00:37:13 -07:00
Lianmin Zheng	ad0ff62a4c	Balance test in CI (#1411 )	2024-09-12 23:29:44 -07:00
Ying Sheng	9a903a8784	[Minor] Raise exception for wrong import (#1409 )	2024-09-12 23:02:36 -07:00
Lianmin Zheng	68be2f6d3b	[CI] Include triton backend and online serving benchmark into CI (#1408 )	2024-09-12 21:36:41 -07:00
Lianmin Zheng	b912de11b0	Make stop reason a dict instead of str (#1407 )	2024-09-12 20:47:31 -07:00
Ying Sheng	eb02c1618a	[Minor, CI] remove lora test from minimal suite (#1406 )	2024-09-12 16:49:50 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00
hxer7963	c33d82a211	Add Support for XVERSE Models (Dense and MoE) to sglang (#1397 ) Co-authored-by: will he <hexin@xverse.cn> Co-authored-by: root <root@localhost.localdomain> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-09-12 01:47:52 -07:00
Kaichen Zhang - NTU	8234e663e9	[Minor Fix] Fix llava modalities issue for single-image (#1402 )	2024-09-12 01:10:26 -07:00
Zihao Ye	debbdb5178	kernel: use tensor cores for flashinfer gqa kernels (#1403 )	2024-09-12 00:38:18 -07:00
Lianmin Zheng	3efa798116	Support cuda graph in the triton attention backend (#1401 )	2024-09-12 00:36:55 -07:00
William	2a71be5e25	Fix README format (#1399 )	2024-09-11 23:46:51 -07:00
Liangsheng Yin	4462137777	Add no commit to main rule (#1393 )	2024-09-12 05:40:45 +08:00
Lianmin Zheng	fec185ce0c	Refactor attention backend (#1381 )	2024-09-11 11:44:26 -07:00
Lianmin Zheng	c03cece42f	Improve error reporting during server launch (#1390 )	2024-09-11 04:50:04 -07:00
Lianmin Zheng	15c75e4146	[Fix] Fix --disable-flashinfer (#1389 )	2024-09-11 04:36:21 -07:00
Vectory	224200e3c2	BaiChuan2 Model (#1367 ) Co-authored-by: wanpenghan <wanpenghan@sohu-inc.com>	2024-09-11 03:55:24 -07:00
Byron Hsu	8c0efa514d	remove assertion in triton attention and add an unit test (#1385 )	2024-09-11 03:22:07 -07:00
Liangsheng Yin	144bc70fcc	Organize flashinfer indices update (#1378 )	2024-09-10 17:38:59 -07:00
Lianmin Zheng	46094e0c1b	Deprecate --disable-flashinfer and introduce --attention-backend (#1380 )	2024-09-10 17:11:16 -07:00
Lianmin Zheng	3a6e8b6d78	[Minor] move triton attention kernels into a separate folder (#1379 )	2024-09-10 15:15:08 -07:00
Liangsheng Yin	fbb4754cb8	Fix vocab mask update bug (#1376 )	2024-09-10 13:10:36 -07:00
Lianmin Zheng	6c7cb90365	[Minor] improve kill scripts and torchao import (#1375 )	2024-09-11 04:27:03 +10:00
josephrocca	dff2860a69	Fix CORS compatibility with OpenAI, vLLM, TGI, LMDeploy (#1373 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-09-11 02:35:03 +10:00
William	e72275cf7f	Support MiniCPM3 (#1371 )	2024-09-10 19:57:52 +10:00
wangchao	fec2d1223c	[Fix] fix bug of `undefined is_single` in meth `create_abort_task` (#1370 )	2024-09-10 01:17:37 -07:00
Lianmin Zheng	8d1095dbf0	[Docs] Improve documentations (#1368 )	2024-09-09 20:48:28 -07:00
Chayenne	743007e1ce	Adding Documentation for installation (#1300 ) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com>	2024-09-09 19:09:13 -07:00
zifeitong	9144ed1067	Support OpenAI API json_schema response format (#1363 )	2024-09-09 19:08:25 -07:00
Liangsheng Yin	69b3bb9ae1	Unify forward mode (#1360 )	2024-09-09 13:49:29 -07:00
Ying Sheng	689ff588ec	[CI] Return output logprobs in unit test (#1361 )	2024-09-09 13:05:13 -07:00
Jerry Zhang	a7c47e0f02	Add torchao quant (int4/int8/fp8) to llama models (#1341 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-09-09 05:32:41 -07:00
Lianmin Zheng	e4d68afcf0	[Minor] Many cleanup (#1357 )	2024-09-09 04:14:11 -07:00
Kai-Hsun Chen	c9b75917d5	[server] Passing `model_override_args` to `launch_server` via the CLI. (#1298 ) Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2024-09-09 02:14:25 -07:00
Kaichen Zhang - NTU	662ecd9368	[Feat] Add modalities for vision server when handling pixel values for llava (#1346 )	2024-09-09 02:07:34 -07:00
Byron Hsu	8e6bdf851c	[triton] Support head_dim not 2^n in triton extend and decode attention (#1281 )	2024-09-09 01:30:24 -07:00

1 2 3 4 5 ...

839 Commits