sglang

Author	SHA1	Message	Date
Zhihao Zhang	e7bc600304	[Feature] Speculative decoding support lookahead (#9873 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-09-18 16:42:41 -07:00
Muqi Li	d5e2a37414	Benchmark: Support API_KEY without 'bearer' (#10380 )	2025-09-12 16:29:04 -07:00
blzheng	97fff98c68	[CPU] Fix phi4-mm prompt issue in bench_serving (#9900 )	2025-09-08 20:12:32 -07:00
Yineng Zhang	19d64f2b72	fix: resolve lint issue (#10181 )	2025-09-08 15:09:55 -07:00
Teng Ma	a02071a12c	[Bench] feat: mooncake trace integration (#9839 ) Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Signed-off-by: Teng Ma <sima.mt@alibaba-inc.com> Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>	2025-09-09 02:50:54 +08:00
Mick	16a6d21b95	chore: enhance bench_serving for vlms with a new dataset of configurable image count and resolution (#9583 ) Co-authored-by: yhyang201 <yhyang201@gmail.com>	2025-08-26 17:42:54 -07:00
Mick	584e1ab2d0	fix: fix unsupported palette mode of images in bench_serving for mmmu (#9206 )	2025-08-14 18:44:46 -07:00
Brayden Zhong	a37e1247c1	[Multimodal][Perf] Use `pybase64` instead of `base64` (#7724 )	2025-07-08 14:00:58 -07:00
Lianmin Zheng	22352d47a9	Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632 ) Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-06-29 23:16:19 -07:00
Xinyuan Tong	c45e49d817	oai: Adds support for OpenAI chat completions API in bench_serving (#7036 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2025-06-28 22:59:20 +00:00
Zijian	31d6dee5c4	Support VILA models (#6106 )	2025-06-11 11:47:25 -07:00
Xinyuan Tong	697b0f71f0	[Refactor] image data process in bench_serving (#6879 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-06-06 21:11:17 -07:00
Yikai Zhang	fb507b7b10	[FIX] mmmu bench serving result display error (#6525 ) (#6791 )	2025-05-31 13:48:06 -07:00
Yuhong Guo	d279d4990c	Fix aiohttp 'Chunk too big' in bench_serving (#6737 )	2025-05-30 00:50:36 -07:00
fzyzcjy	501efc3d36	Tiny fix CI (#6611 )	2025-05-25 23:36:34 -07:00
fzyzcjy	6bebef60a7	Support accurate length control for bench serving (#6594 )	2025-05-25 22:46:23 -07:00
fzyzcjy	25be63d0b2	Auto handle PD disaggregation in bench_serving (#6587 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-25 22:41:27 -07:00
fzyzcjy	2c3a6fe1de	Fix bench_serving does not support changing warmup requests (#6439 )	2025-05-25 22:35:36 -07:00
Yineng Zhang	a6970a17f3	misc: fix accept_length (#6536 )	2025-05-22 14:27:10 -07:00
fzyzcjy	969660c762	Recover from corrupted cache file in bench serving (#6510 )	2025-05-21 17:13:54 -07:00
fzyzcjy	7222e1dacc	Let bench_one_batch_server use sharegpt data to make expert distribution more natural (#5573 )	2025-05-21 02:08:43 -07:00
fzyzcjy	26ebb849eb	Tiny refactor bench_serving to extract RequestFuncOutput.init_new (#6108 )	2025-05-17 17:08:52 -07:00
fzyzcjy	02973cd9a4	Tiny refactor bench_serving to improve extensibility (#6134 )	2025-05-17 17:07:58 -07:00
fzyzcjy	6d95a35abf	Support outputing details for bench_serving (#6107 )	2025-05-17 17:06:52 -07:00
Yineng Zhang	f24fc5b86d	fix typo (#6248 )	2025-05-12 15:45:12 -07:00
Lianmin Zheng	fba8eccd7e	Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-05-12 00:17:33 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
XinyuanTong	e88dd482ed	[CI]Add performance CI for VLM (#6038 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-07 19:20:03 -07:00
Lianmin Zheng	35ca04d2fa	[CI] fix port conflicts (#5789 )	2025-04-27 05:17:44 -07:00
vzed	094891c01a	fix: Use `is not None` instead of `!= None` for None checks. (#5687 )	2025-04-26 19:26:57 -07:00
Yineng Zhang	b1f6d89b5f	fix: update truss bench_serving (#5683 )	2025-04-23 13:28:35 -07:00
Yineng Zhang	7282ab741a	fix: update bench_speculative (#5649 )	2025-04-22 16:08:15 -07:00
fzyzcjy	9924bbe153	Fix bench_serving fail when zero warmup requests (#5574 )	2025-04-20 14:16:03 -07:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
Yuhong Guo	3dfc6023ce	Fix bench_serving with random-ids (#5214 )	2025-04-15 01:34:35 -07:00
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
Yineng Zhang	fda6bb78da	update bench_serving (#4958 )	2025-04-01 15:10:56 -07:00
chaobo jia	ef9a378a20	[Feature] add multi-rank support for Lora (#4492 ) Co-authored-by: rudy152 <czh1137892874@gmail.com>	2025-03-28 09:38:44 -07:00
Stefan He	5d7edc8e55	Support FA3 as Attention backend by using `--attention-backend fa3` (#4680 ) Co-authored-by: qsong <qsong@linkedin.com> Co-authored-by: qingquansong <ustcsqq@gmail.com>	2025-03-23 23:28:11 -07:00
Xu Song	470b474075	Update bench_serving.py (#4454 )	2025-03-15 16:33:58 -07:00
Mingshan	0fe7c13be1	Fix bench_serving flush cache not recognizing OPENAI_API_KEY (#4181 ) Signed-off-by: Mingshan <git@brighill.com>	2025-03-08 01:03:38 -08:00
Lzhang-hub	3a3918121f	fix bench serving bug (#4135 )	2025-03-06 05:34:02 -08:00
Lianmin Zheng	e074d84e5b	[Minor] more code cleanup (#4077 )	2025-03-04 21:23:47 -08:00
Lianmin Zheng	66301e124f	Improve code styles (#4021 )	2025-03-03 03:20:23 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Kebe	ec0a72c2d9	Fix bench_serving not recognizing OPENAI_API_KEY (#3870 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-02-27 20:18:53 -08:00
Lianmin Zheng	287d07a669	Misc fixes for eagle (flush_cache, CPU overhead) (#3014 )	2025-01-20 20:27:38 -08:00
Lianmin Zheng	8f2c522aba	Improve benchmark scripts and error message printing (#2922 )	2025-01-16 06:24:31 -08:00
Pratyush Patel	8f15789314	Add more metrics to serving benchmark. (#2819 )	2025-01-10 23:30:44 +08:00
Lianmin Zheng	bdc1acf6cd	Misc fix for min_p_sampling, --cuda-graph-bs (#2761 )	2025-01-07 02:52:53 -08:00

1 2

100 Commits