sglang

Author	SHA1	Message	Date
bjmsong	e21026690d	benchmark decoding attention kernel with cudnn (#2467 ) Co-authored-by: root <bjmsong@126.com>	2024-12-17 03:31:57 -08:00
Lianmin Zheng	56198b45d9	Add a benchmark script for in-batch prefix caching (#2494 )	2024-12-16 18:49:02 -08:00
Xiaoyu Zhang	a0592c059f	[Benchmark] add a benchmark for hf/vllm/sglang rmsnorm (#2486 )	2024-12-15 13:52:08 +08:00
bjmsong	f67723940d	decoding attention kernel benchmark (#2425 ) Co-authored-by: root <bjmsong@126.com>	2024-12-11 04:46:59 -08:00
Xiaoyu Zhang	3844feb9bb	Add a unittest for fused_moe (#2416 )	2024-12-08 22:46:10 -08:00
Lianmin Zheng	07ec07ad1f	Improve torch compile for fused moe (#2327 )	2024-12-03 01:58:25 -08:00
Lianmin Zheng	33deca81b5	Add more fused moe benchmark utilities (#2314 )	2024-12-02 04:26:55 -08:00
Xiaoyu Zhang	262e370f78	[benchmark] Add fused_moe_triton benchmark and tuning tools (#2225 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: HAI <hixiao@gmail.com>	2024-11-29 13:36:45 -08:00
Henry Hyeonmok Ko	dbe1729395	Merged three native APIs into one: get_server_info (#2152 )	2024-11-24 01:37:58 -08:00
Byron Hsu	cbedd1db1d	[router] cache-aware load-balancing router v1 (#2114 )	2024-11-23 08:34:48 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Lianmin Zheng	c29b98e043	Fix json benchmark (#2043 )	2024-11-15 05:33:43 -08:00
DarkSharpness	954f4e6bd6	benchmark json schema (#2030 )	2024-11-15 05:06:19 -08:00
Byron Hsu	f9633fa9b9	[rust] cache-aware DP - approx tree (#1934 )	2024-11-10 21:57:32 -08:00
Xuehai Pan	a5e0defb5a	minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926 )	2024-11-06 13:46:04 +00:00
Lianmin Zheng	dd3809fad8	Fix engine unit test (#1701 )	2024-10-17 09:53:32 -07:00
Ying Sheng	9c064bf78a	[LoRA, Performance] Speedup multi-LoRA serving - Step 1 (#1587 )	2024-10-06 10:33:44 -07:00
Theresa Barton	2c7d0a5b8b	[Fix] Fix all the Huggingface paths (#1553 )	2024-10-02 10:12:07 -07:00
Ying Sheng	37963394aa	[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433 )	2024-09-15 12:46:04 -07:00
Lianmin Zheng	e4d68afcf0	[Minor] Many cleanup (#1357 )	2024-09-09 04:14:11 -07:00
Kai-Hsun Chen	c9b75917d5	[server] Passing `model_override_args` to `launch_server` via the CLI. (#1298 ) Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2024-09-09 02:14:25 -07:00
Yineng Zhang	62f15eea5a	docs: add conclusion (#1340 )	2024-09-06 04:25:14 +10:00
Yineng Zhang	79794af52d	docs: highlight ttft itl and throughput (#1337 )	2024-09-06 00:00:06 +10:00
Yineng Zhang	3494b32c3a	docs: update README (#1336 )	2024-09-05 23:39:44 +10:00
Lianmin Zheng	57d0bd91ec	Improve benchmark (#1140 )	2024-08-17 17:43:23 -07:00
Lianmin Zheng	5a261bd055	Fix the deadlock in multi-node tp (#1122 )	2024-08-16 01:39:24 -07:00
Lianmin Zheng	326df4bab2	Use a single workspace for flashinfer (#1077 )	2024-08-14 19:25:37 -07:00
Yineng Zhang	1c2b5f5240	docs: update nsys usage (#1103 )	2024-08-15 01:39:15 +08:00
Liangsheng Yin	a34dd86a7d	Use `dtype` to control generate (#1082 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-08-14 15:58:07 +00:00
Lianmin Zheng	a59636bb5e	Update grok 1 model (#1095 )	2024-08-14 04:40:44 -07:00
Meng, Peng	41bb1ab10d	fix nsys cannot profile cuda kernel (#957 )	2024-08-07 11:51:21 +08:00
Ke Bao	e1eae1fd15	Support MLA for DeepSeek-V2 with Triton - step 1 (#905 )	2024-08-05 03:40:33 +10:00
Yineng Zhang	1edd4e07d6	chore: bump v0.2.7 (#830 )	2024-07-30 20:41:10 +10:00
Yineng Zhang	a50c8a14b3	fix: use v0.2.5 for benchmark (#814 )	2024-07-30 12:40:35 +10:00
Ying Sheng	db6089e6f3	Revert "Organize public APIs" (#815 )	2024-07-29 19:40:28 -07:00
Liangsheng Yin	c8e9fed87a	Organize public APIs (#809 )	2024-07-29 15:34:16 -07:00
Yineng Zhang	768e05d08f	fix benchmark (#743 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-07-26 21:26:13 +10:00
Yineng Zhang	fded67441d	misc: update bulid instruction (#724 )	2024-07-25 17:08:11 +10:00
Yineng Zhang	97e0f7d250	docs: update comment (#721 )	2024-07-25 10:51:18 +10:00
Ying Sheng	30d8e130e7	Improve benchmark scripts (#717 )	2024-07-24 14:44:14 -07:00
Ying Sheng	08a3bd19cc	docs: update doc (#716 )	2024-07-24 20:44:03 +00:00
Yineng Zhang	321a963b01	misc: update doc (#715 )	2024-07-24 13:05:46 -07:00
Yineng Zhang	2d3ae4e125	docs: update doc (#713 )	2024-07-25 00:03:17 +10:00
Yineng Zhang	75f4ccb7dd	docs: update README (#712 )	2024-07-24 23:33:28 +10:00
Lianmin Zheng	490a1f39dd	Fix cuda graph with flashinfer (#675 )	2024-07-20 02:43:55 -07:00
zhyncs	2e341cd493	misc: add pre-commit config (#637 )	2024-07-17 11:55:39 -07:00
Lianmin Zheng	41d1f67704	Fix flush cache (#627 )	2024-07-15 20:44:04 -07:00
Ying Sheng	6a2941f4d0	Improve tensor parallel performance (#625 ) Co-authored-by: Mingyi <wisclmy0611@gmail.com>	2024-07-15 07:10:51 -07:00
Mingyi	5ac8b80677	Simplify mem state (#623 )	2024-07-15 02:01:09 -07:00
Ying Sheng	bae9541e4c	Update benchmark script (#621 )	2024-07-14 21:38:53 +00:00

1 2

90 Commits