Commit Graph

12 Commits

Author SHA1 Message Date
fzyzcjy
d502dae0f0 Tiny change killall_sglang.sh (#6596) 2025-05-25 22:36:51 -07:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Byron Hsu
9a0cc2e90e [router] Forward all request headers from router to workers (#3070) 2025-01-23 20:30:31 -08:00
Lianmin Zheng
8a6906127a Improve linear.py to load sharded weights & remove the dependency of Parameters from vllm (#2784)
Co-authored-by: SangBin Cho rkooo567@gmail.com
2025-01-07 23:29:10 -08:00
Ata Fatahi
ce094a5d79 Clean up GPU memory after killing sglang processes (#2457)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-17 03:42:40 -08:00
Lianmin Zheng
96db0f666d Update killall_sglang.sh (#2397) 2024-12-08 01:56:26 -08:00
Yineng Zhang
75ae968959 minor: update killall script (#2391) 2024-12-08 04:21:00 +08:00
Lianmin Zheng
0d6a49bd7d [CI] Kill zombie processes (#2280) 2024-11-30 00:24:30 -08:00
Lianmin Zheng
722530fa01 Enable overlap scheduler by default for the triton attention backend (#2105) 2024-11-20 02:58:35 -08:00
Lianmin Zheng
a2e0424abf Fix memory leak for chunked prefill 2 (#1858)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2024-10-31 14:51:51 -07:00
Lianmin Zheng
b548801ddb Update docs (#1839) 2024-10-30 02:49:08 -07:00
Lianmin Zheng
6aa94b967c Update ci workflows (#1804) 2024-10-26 04:32:36 -07:00