sglang

Author	SHA1	Message	Date
Ying Sheng	6f3cf1297e	[CI, AMD] Add AMD tests to CI (#1491 )	2024-09-22 04:45:10 -07:00
Lianmin Zheng	13f1357ef0	Add a unit test for data parallelism (#1489 )	2024-09-22 02:21:05 -07:00
wellhowtosay	2a99993cd9	Pr fix max workers (#1456 ) Co-authored-by: baolujia <baolujia@shizhuang-inc.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-09-22 02:20:26 -07:00
Lianmin Zheng	167591e864	Better unit tests for adding a new model (#1488 )	2024-09-22 01:50:37 -07:00
Yineng Zhang	441c22db8c	doc: update backend (#1486 )	2024-09-21 22:05:12 +08:00
Ran Chen	ce636ac441	fix incorrect links in documentation (#1481 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-09-21 20:36:23 +08:00
Yineng Zhang	82136eb0b5	chore: bump v0.3.1.post3 (#1483 )	2024-09-21 11:17:45 +08:00
Ke Bao	b8ccaf4d73	Add MLA gsm8k eval (#1484 )	2024-09-21 11:16:13 +08:00
Ke Bao	a68cb201dd	Fix triton head num (#1482 )	2024-09-21 10:25:20 +08:00
Niklas Muennighoff	014982b5e0	Add OLMoE (#1476 )	2024-09-20 10:32:49 +08:00
Yineng Zhang	a6db88626e	minor: add quant eval compared with base (#1475 )	2024-09-20 01:57:19 +08:00
Yineng Zhang	b4408b0d16	feat: update linear deps 1/N (#1305 )	2024-09-19 20:53:11 +08:00
Lianmin Zheng	2cd7e181dd	Fix env vars in bench_latency (#1472 )	2024-09-19 03:19:26 -07:00
Lianmin Zheng	5ce55aee15	Release v0.3.1.post2 (#1470 )	2024-09-19 02:03:38 -07:00
Lianmin Zheng	2d346a57c2	Fix padding in the cuda graph (#1469 )	2024-09-19 01:52:15 -07:00
Li Bo	446ea33277	fix: creat new dict everytime for putting new frame (#1464 )	2024-09-19 01:31:48 -07:00
Ying Sheng	8f527e2940	[Event] Add public meeting invite to README (#1458 )	2024-09-18 23:53:22 +08:00
Lianmin Zheng	7f24ea95c3	Fuse top_k and top_k in the sampler (#1457 )	2024-09-18 04:35:35 -07:00
Lianmin Zheng	1acccb364a	Fix oom issues with fp8 for llama (#1454 )	2024-09-18 03:45:19 -07:00
HAI	aa2750beb3	[Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419 ) (#1453 )	2024-09-18 02:01:35 -07:00
Lianmin Zheng	5e62a6b706	Add bench_server_latency.py (#1452 )	2024-09-18 00:56:06 -07:00
Xiao Yu	5752f25eef	Fixed n>1 causing list index out of range with VLM (#1449 )	2024-09-18 00:46:32 -07:00
Liangsheng Yin	7c162fa9c5	Fix schedule bug (#1451 )	2024-09-17 22:59:32 -07:00
Liangsheng Yin	36078fb247	fix schedule bug (#1450 )	2024-09-17 16:33:53 -07:00
Ke Bao	b3710d2c93	Fix attention backend (#1448 )	2024-09-17 14:07:53 +00:00
Ke Bao	c6b6d2e71b	Enable MLA by default (#1447 )	2024-09-17 11:42:48 +00:00
Lianmin Zheng	90a26be31c	Release 0.3.1.post1 (#1445 )	2024-09-17 01:47:31 -07:00
Jani Monoses	1f4b5f770d	Add OLMoE model (#1444 )	2024-09-17 01:14:53 -07:00
Ke Bao	76524b70d1	Fix torch compile for deepseek-v2 (#1442 )	2024-09-17 00:52:08 -07:00
HAI	3a6e04185b	[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420 )	2024-09-17 07:43:52 +00:00
Lianmin Zheng	2fa5cec775	Simplify sampler and its error handling (#1441 )	2024-09-16 21:23:31 -07:00
Lianmin Zheng	27b557aea7	Clean up model loader (#1440 )	2024-09-16 18:16:27 -07:00
zifeitong	93dffd699b	Add constrained_json_whitespace_pattern to ServerArgs (#1438 )	2024-09-16 13:29:18 -07:00
Ying Sheng	2abe4f1cb6	Revert "[Minor] Raise exception for wrong import (#1409 )" (#1432 )	2024-09-15 15:22:32 -07:00
Ying Sheng	37963394aa	[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433 )	2024-09-15 12:46:04 -07:00
Lianmin Zheng	899cf5c438	Remove deprecated configs (#1431 )	2024-09-15 08:52:18 -07:00
Lianmin Zheng	e79f6cd73d	Release v0.3.1 (#1430 )	2024-09-15 23:03:16 +09:00
Lianmin Zheng	9ba1f09760	[Fix] Fix logprob and normalized_logprob (#1428 )	2024-09-15 06:36:06 -07:00
Lianmin Zheng	282681b8a1	Update backend.md (#1429 )	2024-09-15 02:55:34 -07:00
William Arnold	58cafe23a7	Add libibverbs-dev to Dockerfile (#1427 )	2024-09-15 15:40:31 +09:00
Lianmin Zheng	9463bc1385	Enable torch.compile for triton backend (#1422 )	2024-09-14 15:38:37 -07:00
Yineng Zhang	e3fc4658f4	fix: resolve nightly eval (#1426 )	2024-09-15 02:07:52 +10:00
Ke Bao	33b54e7c40	Add pytorch sampling backend ut (#1425 )	2024-09-15 01:15:30 +10:00
Jerry Zhang	30b404ce72	Add torchao quant for mixtral and qwen_moe (#1418 )	2024-09-14 06:46:55 +00:00
Liangsheng Yin	70b6802982	Optimize conflicts between CUDA graph and vocab mask tensors (#1392 )	2024-09-13 20:27:53 -07:00
Yineng Zhang	f3d32f888a	ci: fix finish (#1414 )	2024-09-14 01:01:30 +10:00
Lianmin Zheng	8779da95d6	Update pr-test.yml (#1412 )	2024-09-13 00:37:13 -07:00
Lianmin Zheng	ad0ff62a4c	Balance test in CI (#1411 )	2024-09-12 23:29:44 -07:00
Ying Sheng	9a903a8784	[Minor] Raise exception for wrong import (#1409 )	2024-09-12 23:02:36 -07:00
Lianmin Zheng	68be2f6d3b	[CI] Include triton backend and online serving benchmark into CI (#1408 )	2024-09-12 21:36:41 -07:00

1 2 3 4 5 ...

871 Commits