sglang

Author	SHA1	Message	Date
Lianmin Zheng	69aa937aa5	Fix unit tests and type annotations (#1648 )	2024-10-12 14:49:24 -07:00
Lianmin Zheng	00c7e6368b	Release v0.3.3.post1 (#1636 )	2024-10-11 07:56:16 -07:00
Lianmin Zheng	23cc66f7b6	Add back data parallelism (#1635 )	2024-10-11 07:22:48 -07:00
Ying Sheng	04b262cd91	[Fix] Fix major performance bug in certain cases (#1563 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-10-04 08:51:11 +00:00
Lianmin Zheng	048685430d	Improve process creation (#1534 )	2024-09-29 02:36:12 -07:00
Ying Sheng	9aa6553d2a	[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525 )	2024-09-27 23:32:11 -07:00
Lianmin Zheng	bc068e9618	[CI] Move AMD test to a separate file (#1500 )	2024-09-24 02:06:28 -07:00
Yineng Zhang	42a2d82ba7	minor: add mla fp8 test (#1494 )	2024-09-23 20:40:17 +08:00
Ying Sheng	6f3cf1297e	[CI, AMD] Add AMD tests to CI (#1491 )	2024-09-22 04:45:10 -07:00
Lianmin Zheng	13f1357ef0	Add a unit test for data parallelism (#1489 )	2024-09-22 02:21:05 -07:00
Ke Bao	b8ccaf4d73	Add MLA gsm8k eval (#1484 )	2024-09-21 11:16:13 +08:00
Ke Bao	a68cb201dd	Fix triton head num (#1482 )	2024-09-21 10:25:20 +08:00
Lianmin Zheng	1acccb364a	Fix oom issues with fp8 for llama (#1454 )	2024-09-18 03:45:19 -07:00
Lianmin Zheng	9ba1f09760	[Fix] Fix logprob and normalized_logprob (#1428 )	2024-09-15 06:36:06 -07:00
Yineng Zhang	f3d32f888a	ci: fix finish (#1414 )	2024-09-14 01:01:30 +10:00
Lianmin Zheng	8779da95d6	Update pr-test.yml (#1412 )	2024-09-13 00:37:13 -07:00
Lianmin Zheng	ad0ff62a4c	Balance test in CI (#1411 )	2024-09-12 23:29:44 -07:00
Lianmin Zheng	68be2f6d3b	[CI] Include triton backend and online serving benchmark into CI (#1408 )	2024-09-12 21:36:41 -07:00
Lianmin Zheng	f64eae3a29	[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308 )	2024-09-02 21:44:45 -07:00
Yineng Zhang	2561ed012c	feat: update nightly gsm8k eval (#1304 )	2024-09-03 01:18:41 +10:00
Yineng Zhang	6487ef64c6	ci: add nightly eval (#1291 )	2024-09-02 03:19:49 +10:00
Lianmin Zheng	761b2cebd6	[CI] merge all ci tests into one file (#1289 )	2024-09-01 02:36:56 -07:00
Lianmin Zheng	1b5d56f7f8	[CI] Add more multi-gpu tests (#1280 )	2024-09-01 00:27:25 -07:00
Lianmin Zheng	6c49831394	Add sglang.bench_latency to CI (#1243 )	2024-08-28 21:20:54 +10:00
Yineng Zhang	f25f4dfde5	hotfix: revert sampler CUDA Graph (#1242 )	2024-08-28 21:16:47 +10:00
Liangsheng Yin	1ece2cda3d	Fix bench latency benchmark (#1225 )	2024-08-28 00:37:32 -07:00
Mingyi	97589a60a2	[CI] Parallelize unit tests in CI (#1219 )	2024-08-26 04:54:02 +00:00
Liangsheng Yin	632d506d0b	minor: improve CI and dependencies (#1212 )	2024-08-26 04:26:31 +00:00
Lianmin Zheng	d3efcb3930	Update workflow files (#1214 )	2024-08-25 17:45:35 -07:00
Lianmin Zheng	61bb223e0f	Update CI runner docs (#1213 )	2024-08-25 17:31:52 -07:00
Lianmin Zheng	15f1a49d2d	Update CI workflows (#1210 )	2024-08-25 16:43:07 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Liangsheng Yin	5d0d40d0eb	Fix CI accuracy && time out limit (#1133 )	2024-08-16 21:41:11 -07:00
Yineng Zhang	26e9c12c15	ci: compatible with fork repo (#1115 )	2024-08-16 04:26:44 +10:00
Lianmin Zheng	e86b1ccbf0	Enable chunked prefill by default (#1040 )	2024-08-14 21:56:20 -07:00
Yineng Zhang	f14569f64a	ci: remove workflow path trigger (#1096 )	2024-08-14 20:36:24 +10:00
Yineng Zhang	c8423ca311	ci: update timeout and retry (#1086 ) Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2024-08-14 00:27:35 -07:00
Yineng Zhang	cebd78d83e	ci: add accuracy timeout (#1078 )	2024-08-13 22:12:58 +10:00
Yineng Zhang	f7fb68d292	ci: add moe test (#1053 )	2024-08-13 18:43:23 +10:00
Yineng Zhang	396a13e6ad	ci: add cancel pr workflow (#1070 )	2024-08-13 18:16:50 +10:00
Lianmin Zheng	c877292cc1	Re-organize CI tests (#1052 )	2024-08-12 03:39:01 -07:00
Lianmin Zheng	41598e0d8e	Add longer accuracy test on CI (#1049 )	2024-08-12 09:21:38 +00:00
Yineng Zhang	cb99ba4fc6	feat: update Dockerfile (#1033 ) Co-authored-by: vhain <vhain6512@gmail.com>	2024-08-12 16:24:06 +10:00
Lianmin Zheng	8207637029	Improve end-to-end throughput test and its coverage (#1039 )	2024-08-11 18:27:33 -07:00
Lianmin Zheng	54fb1c80c0	Clean up unit tests (#1020 )	2024-08-10 15:09:03 -07:00
Yineng Zhang	e712837d38	misc: update test config (#990 )	2024-08-11 04:20:30 +10:00
Ying Sheng	e040a2450b	Add e5-mistral embedding model - step 3/3 (#988 )	2024-08-08 16:31:19 -07:00
Liangsheng Yin	4d929107ae	Run purge-cache only in sgl-project (#976 )	2024-08-07 13:16:36 -07:00
Liangsheng Yin	fbe0c818c2	Purge self-runner's pip cache weekly (#975 )	2024-08-07 12:43:12 -07:00
Yineng Zhang	c31f084c71	chore: update vllm to 0.5.4 (#966 )	2024-08-07 21:15:41 +10:00

1 2

87 Commits