sglang

Author	SHA1	Message	Date
Theresa Barton	2c7d0a5b8b	[Fix] Fix all the Huggingface paths (#1553 )	2024-10-02 10:12:07 -07:00
Ying Sheng	37963394aa	[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433 )	2024-09-15 12:46:04 -07:00
Lianmin Zheng	e4d68afcf0	[Minor] Many cleanup (#1357 )	2024-09-09 04:14:11 -07:00
Kai-Hsun Chen	c9b75917d5	[server] Passing `model_override_args` to `launch_server` via the CLI. (#1298 ) Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2024-09-09 02:14:25 -07:00
Yineng Zhang	62f15eea5a	docs: add conclusion (#1340 )	2024-09-06 04:25:14 +10:00
Yineng Zhang	79794af52d	docs: highlight ttft itl and throughput (#1337 )	2024-09-06 00:00:06 +10:00
Yineng Zhang	3494b32c3a	docs: update README (#1336 )	2024-09-05 23:39:44 +10:00
Lianmin Zheng	57d0bd91ec	Improve benchmark (#1140 )	2024-08-17 17:43:23 -07:00
Lianmin Zheng	5a261bd055	Fix the deadlock in multi-node tp (#1122 )	2024-08-16 01:39:24 -07:00
Lianmin Zheng	326df4bab2	Use a single workspace for flashinfer (#1077 )	2024-08-14 19:25:37 -07:00
Yineng Zhang	1c2b5f5240	docs: update nsys usage (#1103 )	2024-08-15 01:39:15 +08:00
Liangsheng Yin	a34dd86a7d	Use `dtype` to control generate (#1082 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-08-14 15:58:07 +00:00
Lianmin Zheng	a59636bb5e	Update grok 1 model (#1095 )	2024-08-14 04:40:44 -07:00
Meng, Peng	41bb1ab10d	fix nsys cannot profile cuda kernel (#957 )	2024-08-07 11:51:21 +08:00
Ke Bao	e1eae1fd15	Support MLA for DeepSeek-V2 with Triton - step 1 (#905 )	2024-08-05 03:40:33 +10:00
Yineng Zhang	1edd4e07d6	chore: bump v0.2.7 (#830 )	2024-07-30 20:41:10 +10:00
Yineng Zhang	a50c8a14b3	fix: use v0.2.5 for benchmark (#814 )	2024-07-30 12:40:35 +10:00
Ying Sheng	db6089e6f3	Revert "Organize public APIs" (#815 )	2024-07-29 19:40:28 -07:00
Liangsheng Yin	c8e9fed87a	Organize public APIs (#809 )	2024-07-29 15:34:16 -07:00
Yineng Zhang	768e05d08f	fix benchmark (#743 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-07-26 21:26:13 +10:00
Yineng Zhang	fded67441d	misc: update bulid instruction (#724 )	2024-07-25 17:08:11 +10:00
Yineng Zhang	97e0f7d250	docs: update comment (#721 )	2024-07-25 10:51:18 +10:00
Ying Sheng	30d8e130e7	Improve benchmark scripts (#717 )	2024-07-24 14:44:14 -07:00
Ying Sheng	08a3bd19cc	docs: update doc (#716 )	2024-07-24 20:44:03 +00:00
Yineng Zhang	321a963b01	misc: update doc (#715 )	2024-07-24 13:05:46 -07:00
Yineng Zhang	2d3ae4e125	docs: update doc (#713 )	2024-07-25 00:03:17 +10:00
Yineng Zhang	75f4ccb7dd	docs: update README (#712 )	2024-07-24 23:33:28 +10:00
Lianmin Zheng	490a1f39dd	Fix cuda graph with flashinfer (#675 )	2024-07-20 02:43:55 -07:00
zhyncs	2e341cd493	misc: add pre-commit config (#637 )	2024-07-17 11:55:39 -07:00
Lianmin Zheng	41d1f67704	Fix flush cache (#627 )	2024-07-15 20:44:04 -07:00
Ying Sheng	6a2941f4d0	Improve tensor parallel performance (#625 ) Co-authored-by: Mingyi <wisclmy0611@gmail.com>	2024-07-15 07:10:51 -07:00
Mingyi	5ac8b80677	Simplify mem state (#623 )	2024-07-15 02:01:09 -07:00
Ying Sheng	bae9541e4c	Update benchmark script (#621 )	2024-07-14 21:38:53 +00:00
Liangsheng Yin	564a898ad9	Optimize mem indices mangement (#619 )	2024-07-13 23:39:37 -07:00
Lianmin Zheng	0feca02dd9	Improve benchmark scripts (#615 )	2024-07-13 15:59:04 -07:00
Lianmin Zheng	65c6577696	Improve benchmark scripts & fix llava (#613 )	2024-07-13 15:00:26 -07:00
Lianmin Zheng	665815969a	Enable cuda graph by default (#612 )	2024-07-13 05:29:46 -07:00
Liangsheng Yin	f25b76c02a	add `LogitsMetadata` (#604 )	2024-07-08 17:46:55 -07:00
Ying Sheng	dc1b8bcfaa	Format (#593 )	2024-07-05 10:06:17 -07:00
sglang	11616fc6bd	Minor fix in compiler & format (#545 )	2024-06-29 23:42:14 -07:00
Lianmin Zheng	945aa9beb2	Update readme (#568 )	2024-06-27 11:37:49 -07:00
Lianmin Zheng	2e6e62e156	Increase the number of thread limitation for tp worker managers. (#567 )	2024-06-26 09:33:45 -07:00
Lianmin Zheng	a385ee27bd	Warmup cublas (#566 )	2024-06-25 12:46:00 -07:00
Liangsheng Yin	92cb93f390	Fix latency benchmark (#557 )	2024-06-22 15:11:04 +08:00
Ying Sheng	09593e9bc9	Multi-node Tensor Parallelism (#550 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-06-17 20:41:24 -07:00
Liangsheng Yin	40e53d65cb	Add disk cache for loading ShareGPT dataset. (#542 )	2024-06-13 16:37:12 +08:00
Ying Sheng	fb9296f0ed	Higher priority for user input of max_prefill_tokens & format (#540 )	2024-06-12 21:48:40 -07:00
Ying Sheng	1374334d38	Fix dependency & crash issues (#539 )	2024-06-12 21:23:19 -07:00
Lianmin Zheng	3bc01ac137	[Minor] improve code style	2024-06-03 18:11:34 -07:00
Lianmin Zheng	09de730dee	Improve benchmark scripts & add more models (#484 )	2024-05-27 14:13:26 -07:00

1 2

73 Commits