Commit Graph

19 Commits

Author SHA1 Message Date
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Lianmin Zheng
287d07a669 Misc fixes for eagle (flush_cache, CPU overhead) (#3014) 2025-01-20 20:27:38 -08:00
Lianmin Zheng
03464890e0 Separate two entry points: Engine and HTTP server (#2996)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-01-19 22:09:24 -08:00
Lianmin Zheng
61f42b5732 Move sgl.Runtime under sglang/lang (#2990) 2025-01-19 17:10:29 -08:00
Lianmin Zheng
8f2c522aba Improve benchmark scripts and error message printing (#2922) 2025-01-16 06:24:31 -08:00
Lianmin Zheng
3815b23ccb Clean up wrapper in flashinfer backend (#2638) 2024-12-29 00:45:57 -08:00
Lianmin Zheng
23e5e50fd5 Fix gemlite import (#2553) 2024-12-22 20:21:17 -08:00
Jerry Zhang
feb2b768ba Add integration with gemlite weight only quant (#2528) 2024-12-21 00:25:25 +08:00
Lianmin Zheng
f8548295d6 Fix warmup in bench_offline_throughput.py (#2449) 2024-12-11 06:16:01 -08:00
Lianmin Zheng
641b7d0ae0 [Minor] Improve code style (#2422) 2024-12-09 06:30:35 -08:00
Lianmin Zheng
fe97a2d40f Simplify tokenizer manager (#2254) 2024-11-29 02:18:51 -08:00
bjmsong
91e5dbf554 add profile in offline benchmark & update doc (#2123)
Co-authored-by: root <bjmsong@126.com>
2024-11-27 14:57:13 -08:00
Lianmin Zheng
dfec7fca06 Rename sglang.bench_latency to sglang.bench_one_batch (#2118) 2024-11-21 20:07:48 -08:00
Lianmin Zheng
3295cd8af2 Allow skipping warmup in bench_offline_throughput.py (#2103) 2024-11-20 01:25:21 -08:00
Lianmin Zheng
3b44bbeecf Allow passing extra request body to bench_offline_throughput.py (#2085) 2024-11-18 14:59:15 -08:00
Lianmin Zheng
edad373135 Fix illegal memory access in overlap mode & Use more fused triton kernels for building meta data (#2051) 2024-11-16 16:14:23 -08:00
Lianmin Zheng
2f2e07439c Fix weight update for data parallelism (#2050) 2024-11-16 00:30:39 -08:00
Lianmin Zheng
2558d6a675 Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager (#2042) 2024-11-15 05:02:44 -08:00
zolinthecow
f6dd648620 Offline LLM Engine Benchmark Throughput (#1968)
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
2024-11-14 21:59:33 -08:00