Commit Graph

16 Commits

Author SHA1 Message Date
Shangming Cai
2ff572e28c [CI][Router] Fix bench_one_batch_server for pd router test (#7731)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-07-02 23:18:24 -07:00
Lianmin Zheng
22352d47a9 Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632)
Co-authored-by: Kan Wu <wukanustc@gmail.com>
2025-06-29 23:16:19 -07:00
Lianmin Zheng
2d72fc47cf Improve profiler and integrate profiler in bench_one_batch_server (#6787) 2025-05-31 15:53:55 -07:00
fzyzcjy
55f6005f53 Fix bench_one_batch_server (#6503) 2025-05-21 11:08:17 -07:00
fzyzcjy
7222e1dacc Let bench_one_batch_server use sharegpt data to make expert distribution more natural (#5573) 2025-05-21 02:08:43 -07:00
Lifu Huang
3cf1473a09 Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-17 16:49:18 -07:00
Lianmin Zheng
fba8eccd7e Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-12 00:17:33 -07:00
Lianmin Zheng
005aad32ad Revert "[fix] fix bench_one_batch_server" (#5785) 2025-04-27 03:48:33 -07:00
JieXin Liang
3c4dc38a9a [fix] fix bench_one_batch_server (#5607) 2025-04-26 18:49:45 -07:00
fzyzcjy
f01b092519 Super tiny fix typo (#4738) 2025-03-24 21:05:45 -07:00
Lianmin Zheng
03464890e0 Separate two entry points: Engine and HTTP server (#2996)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-01-19 22:09:24 -08:00
Lianmin Zheng
d4fc1a70e3 Crash the server correctly during error (#2231) 2024-11-28 00:22:39 -08:00
Lianmin Zheng
fb6e04a0c2 Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2222) 2024-11-27 02:52:46 -08:00
Lianmin Zheng
6997e28f6e Revert "Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default" (#2221) 2024-11-27 02:02:01 -08:00
Lianmin Zheng
a0e58740a8 Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2217) 2024-11-27 01:13:41 -08:00
Lianmin Zheng
dfec7fca06 Rename sglang.bench_latency to sglang.bench_one_batch (#2118) 2024-11-21 20:07:48 -08:00