sglang

Author	SHA1	Message	Date
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Lianmin Zheng	287d07a669	Misc fixes for eagle (flush_cache, CPU overhead) (#3014 )	2025-01-20 20:27:38 -08:00
Lianmin Zheng	03464890e0	Separate two entry points: Engine and HTTP server (#2996 ) Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>	2025-01-19 22:09:24 -08:00
Lianmin Zheng	61f42b5732	Move sgl.Runtime under sglang/lang (#2990 )	2025-01-19 17:10:29 -08:00
Lianmin Zheng	8f2c522aba	Improve benchmark scripts and error message printing (#2922 )	2025-01-16 06:24:31 -08:00
Lianmin Zheng	3815b23ccb	Clean up wrapper in flashinfer backend (#2638 )	2024-12-29 00:45:57 -08:00
Lianmin Zheng	23e5e50fd5	Fix gemlite import (#2553 )	2024-12-22 20:21:17 -08:00
Jerry Zhang	feb2b768ba	Add integration with gemlite weight only quant (#2528 )	2024-12-21 00:25:25 +08:00
Lianmin Zheng	f8548295d6	Fix warmup in bench_offline_throughput.py (#2449 )	2024-12-11 06:16:01 -08:00
Lianmin Zheng	641b7d0ae0	[Minor] Improve code style (#2422 )	2024-12-09 06:30:35 -08:00
Lianmin Zheng	fe97a2d40f	Simplify tokenizer manager (#2254 )	2024-11-29 02:18:51 -08:00
bjmsong	91e5dbf554	add profile in offline benchmark & update doc (#2123 ) Co-authored-by: root <bjmsong@126.com>	2024-11-27 14:57:13 -08:00
Lianmin Zheng	dfec7fca06	Rename sglang.bench_latency to sglang.bench_one_batch (#2118 )	2024-11-21 20:07:48 -08:00
Lianmin Zheng	3295cd8af2	Allow skipping warmup in bench_offline_throughput.py (#2103 )	2024-11-20 01:25:21 -08:00
Lianmin Zheng	3b44bbeecf	Allow passing extra request body to bench_offline_throughput.py (#2085 )	2024-11-18 14:59:15 -08:00
Lianmin Zheng	edad373135	Fix illegal memory access in overlap mode & Use more fused triton kernels for building meta data (#2051 )	2024-11-16 16:14:23 -08:00
Lianmin Zheng	2f2e07439c	Fix weight update for data parallelism (#2050 )	2024-11-16 00:30:39 -08:00
Lianmin Zheng	2558d6a675	Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager (#2042 )	2024-11-15 05:02:44 -08:00
zolinthecow	f6dd648620	Offline LLM Engine Benchmark Throughput (#1968 ) Co-authored-by: ByronHsu <byronhsu1230@gmail.com>	2024-11-14 21:59:33 -08:00

19 Commits