sglang

Author	SHA1	Message	Date
Pratyush Patel	8f15789314	Add more metrics to serving benchmark. (#2819 )	2025-01-10 23:30:44 +08:00
Lianmin Zheng	bdc1acf6cd	Misc fix for min_p_sampling, --cuda-graph-bs (#2761 )	2025-01-07 02:52:53 -08:00
Lianmin Zheng	dc3bee4815	Fix test and benchmark scripts (#2598 )	2024-12-26 07:56:26 -08:00
Ying Sheng	8a56b43175	[Bench] Flush cache before benchmarking (#2566 )	2024-12-24 11:21:21 +08:00
Lianmin Zheng	a2486eb58f	Fix a bug with logprob streaming + chunked prefill (#2403 )	2024-12-08 03:55:27 -08:00
Lianmin Zheng	3ddb1c4679	[Minor] Fix logger and style (#2325 )	2024-12-02 20:45:53 -08:00
bjmsong	01017d4c20	Support LoRA in Completion API (#2243 ) Co-authored-by: root <bjmsong@126.com>	2024-11-29 16:13:38 -08:00
Byron Hsu	4b0a1c9365	Replace prob based with threshold based load balancing (#2170 )	2024-11-24 23:17:11 -08:00
Lianmin Zheng	8e1adb8441	Allow overwrite flashinfer use_tensorcore (#2169 )	2024-11-24 20:58:17 -08:00
Byron Hsu	cbedd1db1d	[router] cache-aware load-balancing router v1 (#2114 )	2024-11-23 08:34:48 -08:00
Yineng Zhang	ad47749b82	fix: resolve bench_serving args (#2139 )	2024-11-23 17:45:42 +08:00
Yunmeng	60769be14d	Add concurrency option for benchmark (#2136 )	2024-11-23 17:07:07 +08:00
bjmsong	ad30d5cf9a	Benchmark with Pytorch Profiler easily (#2110 ) Co-authored-by: root <bjmsong@126.com>	2024-11-21 23:29:50 -08:00
Yineng Zhang	55bd97f3e5	minor: add dataset dump and questions shuffle (#2093 )	2024-11-19 14:07:27 -08:00
Lianmin Zheng	2558d6a675	Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager (#2042 )	2024-11-15 05:02:44 -08:00
zolinthecow	f6dd648620	Offline LLM Engine Benchmark Throughput (#1968 ) Co-authored-by: ByronHsu <byronhsu1230@gmail.com>	2024-11-14 21:59:33 -08:00
Ke Bao	b808a38365	Filter empty prompt in random bench serving (#2011 )	2024-11-12 14:53:41 +08:00
Byron Hsu	8169c6f4ef	Add gen-shared-prefix dataset in bench_serving (#1990 )	2024-11-11 08:39:56 +08:00
Yineng Zhang	793b79dbe9	feat: support truss endpoint for benchmark serving (#1906 )	2024-11-03 12:56:10 -08:00
Lianmin Zheng	175afed370	Improve benchmark scripts (#1672 )	2024-10-14 21:53:01 -07:00
Lianmin Zheng	4a292f670d	[Minor] Add some utility functions (#1671 )	2024-10-14 20:08:03 -07:00
Ying Sheng	04b262cd91	[Fix] Fix major performance bug in certain cases (#1563 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-10-04 08:51:11 +00:00
Lianmin Zheng	5e62a6b706	Add bench_server_latency.py (#1452 )	2024-09-18 00:56:06 -07:00
Lianmin Zheng	e4d68afcf0	[Minor] Many cleanup (#1357 )	2024-09-09 04:14:11 -07:00
min-xu-et	fa13b95d6b	fixed a typo (#1143 )	2024-08-18 14:29:09 -07:00
Lianmin Zheng	3c1f5a9220	Fix duplicated imports in hf_transformers_utils.py (#1141 )	2024-08-17 18:03:00 -07:00
Lianmin Zheng	57d0bd91ec	Improve benchmark (#1140 )	2024-08-17 17:43:23 -07:00
Lianmin Zheng	41598e0d8e	Add longer accuracy test on CI (#1049 )	2024-08-12 09:21:38 +00:00
Lianmin Zheng	8207637029	Improve end-to-end throughput test and its coverage (#1039 )	2024-08-11 18:27:33 -07:00
Roger Wang	05c50a82b8	Minor bugfix on benchmark serving (#1005 )	2024-08-10 02:53:50 +10:00
Juwan Yoo	ab7875941b	feat: frequency, min_new_tokens, presence, and repetition penalties (#973 )	2024-08-08 04:21:08 -07:00
Ying Sheng	ae7ee01a8e	Add accuracy test to CI: MMLU (#882 )	2024-08-01 21:20:17 -07:00
Yineng Zhang	f52eda35ea	misc: update e2e test benchmark config (#825 )	2024-07-30 19:19:23 +10:00
Ying Sheng	ae5c0fc442	Support disable_ignore_eos in bench_serving.py (#824 )	2024-07-30 01:42:07 -07:00
Ying Sheng	db6089e6f3	Revert "Organize public APIs" (#815 )	2024-07-29 19:40:28 -07:00
Liangsheng Yin	c8e9fed87a	Organize public APIs (#809 )	2024-07-29 15:34:16 -07:00
Yineng Zhang	768e05d08f	fix benchmark (#743 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-07-26 21:26:13 +10:00
Ying Sheng	30d8e130e7	Improve benchmark scripts (#717 )	2024-07-24 14:44:14 -07:00
zhyncs	fa7ccb3316	feat: add e2e latency (#704 )	2024-07-24 05:51:10 +10:00
zhyncs	9fdea29d05	misc: fix typo (#698 )	2024-07-23 02:00:27 +10:00
Ying Sheng	df7c4c19b4	Fix trt benchmark (#697 )	2024-07-22 23:32:41 +10:00
zhyncs	d198791fe8	misc: update output token logic (#695 )	2024-07-22 19:34:05 +10:00
zhyncs	c07526e46c	fix: update bench serving (#694 )	2024-07-22 18:23:33 +10:00
zhyncs	65bd13386b	misc: recommend to use chat model for benchmark (#690 )	2024-07-22 00:13:33 +10:00
zhyncs	6a846bb1fd	misc: update output file logic (#686 )	2024-07-21 18:07:30 +10:00
zhyncs	0fdb3127a1	feat: update bench serving (#685 )	2024-07-21 16:46:58 +10:00
Lianmin Zheng	77e592e8e0	support non-streaming benchmark (#682 )	2024-07-20 18:36:42 -07:00
zhyncs	4b4a67f814	feat: support TRT LLM benchmark and multiple benchmarks (#670 )	2024-07-20 11:05:35 -07:00
Lianmin Zheng	9592a1f3bd	Fix random dataset (#671 )	2024-07-20 01:57:43 -07:00
Lianmin Zheng	35759efa91	Support random dataset in bench_serving.py (#669 )	2024-07-20 01:06:43 -07:00

1 2

52 Commits