Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Lianmin Zheng
|
287d07a669
|
Misc fixes for eagle (flush_cache, CPU overhead) (#3014)
|
2025-01-20 20:27:38 -08:00 |
|
Lianmin Zheng
|
03464890e0
|
Separate two entry points: Engine and HTTP server (#2996)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
|
2025-01-19 22:09:24 -08:00 |
|
Lianmin Zheng
|
61f42b5732
|
Move sgl.Runtime under sglang/lang (#2990)
|
2025-01-19 17:10:29 -08:00 |
|
Lianmin Zheng
|
8f2c522aba
|
Improve benchmark scripts and error message printing (#2922)
|
2025-01-16 06:24:31 -08:00 |
|
Lianmin Zheng
|
3815b23ccb
|
Clean up wrapper in flashinfer backend (#2638)
|
2024-12-29 00:45:57 -08:00 |
|
Lianmin Zheng
|
23e5e50fd5
|
Fix gemlite import (#2553)
|
2024-12-22 20:21:17 -08:00 |
|
Jerry Zhang
|
feb2b768ba
|
Add integration with gemlite weight only quant (#2528)
|
2024-12-21 00:25:25 +08:00 |
|
Lianmin Zheng
|
f8548295d6
|
Fix warmup in bench_offline_throughput.py (#2449)
|
2024-12-11 06:16:01 -08:00 |
|
Lianmin Zheng
|
641b7d0ae0
|
[Minor] Improve code style (#2422)
|
2024-12-09 06:30:35 -08:00 |
|
Lianmin Zheng
|
fe97a2d40f
|
Simplify tokenizer manager (#2254)
|
2024-11-29 02:18:51 -08:00 |
|
bjmsong
|
91e5dbf554
|
add profile in offline benchmark & update doc (#2123)
Co-authored-by: root <bjmsong@126.com>
|
2024-11-27 14:57:13 -08:00 |
|
Lianmin Zheng
|
dfec7fca06
|
Rename sglang.bench_latency to sglang.bench_one_batch (#2118)
|
2024-11-21 20:07:48 -08:00 |
|
Lianmin Zheng
|
3295cd8af2
|
Allow skipping warmup in bench_offline_throughput.py (#2103)
|
2024-11-20 01:25:21 -08:00 |
|
Lianmin Zheng
|
3b44bbeecf
|
Allow passing extra request body to bench_offline_throughput.py (#2085)
|
2024-11-18 14:59:15 -08:00 |
|
Lianmin Zheng
|
edad373135
|
Fix illegal memory access in overlap mode & Use more fused triton kernels for building meta data (#2051)
|
2024-11-16 16:14:23 -08:00 |
|
Lianmin Zheng
|
2f2e07439c
|
Fix weight update for data parallelism (#2050)
|
2024-11-16 00:30:39 -08:00 |
|
Lianmin Zheng
|
2558d6a675
|
Fix the default arguments of bench_offline_throughput.py & simplify detokenizer manager (#2042)
|
2024-11-15 05:02:44 -08:00 |
|
zolinthecow
|
f6dd648620
|
Offline LLM Engine Benchmark Throughput (#1968)
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
|
2024-11-14 21:59:33 -08:00 |
|