sglang

Author	SHA1	Message	Date
Lianmin Zheng	33deca81b5	Add more fused moe benchmark utilities (#2314 )	2024-12-02 04:26:55 -08:00
Lianmin Zheng	18108abe5d	[Minor] Fix code style (#2311 )	2024-12-02 02:27:36 -08:00
HAI	c54bda300a	Use rocminfo instead of rocm-smi for more OS/WSL support (#2310 )	2024-12-02 00:15:45 -08:00
Lianmin Zheng	3c79ad35ca	[Fix] Fix the padded hash value for image tokens (#2309 )	2024-12-01 23:36:28 -08:00
Chayenne	983bfcf386	Online weight updates from torch.distributed (#2279 )	2024-12-01 23:23:18 -08:00
Yineng Zhang	28bc60dcab	misc: update build setup (#2306 )	2024-12-02 02:03:49 +08:00
Yineng Zhang	7301a39b13	fix: resolve CodeQL cpp issue (#2305 )	2024-12-01 23:55:19 +08:00
Yineng Zhang	47eb139f81	feat: use warp reduce as a simple example (#2304 )	2024-12-01 22:43:50 +08:00
Lianmin Zheng	5c18a03733	Fix logprob for completions (#2301 )	2024-12-01 05:17:05 -08:00
Yineng Zhang	5c91a315d7	feat: support sgl-kernel pypi (#2302 )	2024-12-01 20:11:21 +08:00
Yineng Zhang	3dbd73d319	minor: rm unused _grouped_size_compiled_for_decode_kernels (#2299 )	2024-12-01 19:24:12 +08:00
Yineng Zhang	e9a6203dee	feat: skip good first issue (#2298 )	2024-12-01 19:18:57 +08:00
Qun Yang	62c516ac45	Add a simple torch native attention backend (#2241 )	2024-12-01 03:01:25 -08:00
Yineng Zhang	fc78640e00	minor: support flashinfer nightly (#2295 )	2024-12-01 18:55:26 +08:00
gobraves	906d795f15	Feat: upgrade outlines & support compatibility with the old version (#2292 )	2024-12-01 02:07:27 -08:00
Yineng Zhang	118b6af35e	feat: add should_use_tensor_core (#2179 )	2024-12-01 18:01:16 +08:00
Lianmin Zheng	9449a95431	[CI] Balance CI tests (#2293 )	2024-12-01 01:47:30 -08:00
Liangsheng Yin	5f12f0e7af	Fix chunked prefill when ignore eos (#2290 )	2024-12-01 00:37:53 -08:00
yizhang2077	d5b95cbb53	adapt vllm distributed module to sglang (#2244 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-12-01 15:54:52 +08:00
Lianmin Zheng	0303ca918f	[CI] Fix missing files in run_suite.py (#2288 )	2024-11-30 23:53:34 -08:00
Yineng Zhang	00181098dd	feat: add Dockerfile for development (#2289 )	2024-12-01 15:27:52 +08:00
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	1bfa511b95	[CI] Fix ci tests (#2284 )	2024-11-30 21:12:03 -08:00
Lianmin Zheng	f5b5f2bff9	Revert "[Fix] fix assertion error for chunked prefill when disabling cache" (#2286 )	2024-11-30 19:03:42 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Rui Wang	d622851dc9	[Fix] fix assertion error for chunked prefill when disabling cache (#2282 )	2024-11-30 17:53:43 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Lianmin Zheng	0d6a49bd7d	[CI] Kill zombie processes (#2280 )	2024-11-30 00:24:30 -08:00
Lianmin Zheng	ccaf1f997c	[CI] Print summary on github actions (#2274 )	2024-11-29 23:48:54 -08:00
Chayenne	7d1485d376	Add get weights by parameter name for llama (#2266 )	2024-11-29 23:36:38 -08:00
Chayenne	7d5d1d3d29	udate weights from disk (#2265 )	2024-11-30 01:17:00 +00:00
Lianmin Zheng	b53d6cbda3	Add new contributors so they can trigger CI automatically (#2269 ) Co-authored-by: Qun Yang <qun.yang@intel.com> Co-authored-by: zhengy001 <zhengy.gator@gmail.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: xiaobo <xiaob.chen@outlook.com>	2024-11-29 16:37:52 -08:00
bjmsong	01017d4c20	Support LoRA in Completion API (#2243 ) Co-authored-by: root <bjmsong@126.com>	2024-11-29 16:13:38 -08:00
Lianmin Zheng	94e167ea5a	Fix the default chunked prefill size (#2268 )	2024-11-29 16:03:32 -08:00
Xiaoyu Zhang	262e370f78	[benchmark] Add fused_moe_triton benchmark and tuning tools (#2225 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: HAI <hixiao@gmail.com>	2024-11-29 13:36:45 -08:00
Yineng Zhang	419a57e771	minor: add sgl-kernel dir (#2261 )	2024-11-30 02:27:35 +08:00
Yineng Zhang	fae4e5e99a	chore: bump v0.3.6.post3 (#2259 )	2024-11-30 01:41:16 +08:00
Lianmin Zheng	afe1e46586	[Minor] fix the style for multimodal models (#2257 )	2024-11-29 04:24:20 -08:00
Lianmin Zheng	f50a6cf443	Fix hash collision for multi modal models (#2256 )	2024-11-29 03:15:58 -08:00
Lianmin Zheng	fe97a2d40f	Simplify tokenizer manager (#2254 )	2024-11-29 02:18:51 -08:00
Ying Sheng	8b48496aaf	Revert "Revert "Add simple CPU offloading support"" (#2253 ) Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-28 23:58:54 -08:00
Ying Sheng	4057ea82c9	Revert "Add simple CPU offloading support" (#2252 ) We'll re-add the commit to correctly ack Kaichao's authorship	2024-11-28 23:36:55 -08:00
Lianmin Zheng	4f2ee48ed1	Update backend.md (#2251 )	2024-11-28 23:18:07 -08:00
Lianmin Zheng	71ff2728a1	Update backend.md (#2250 )	2024-11-28 23:14:36 -08:00
Ying Sheng	b7038fec9b	[fix] Fix prefix caching for multi-image/video (#2239 )	2024-11-28 12:08:13 -08:00
Enrique Shockwave	65fdb28929	fix missing launch server import (#2242 )	2024-11-28 21:24:47 +08:00
Lianmin Zheng	b2ccf36d4d	Fix memory leak during abort (#2238 )	2024-11-28 02:22:15 -08:00
Lianmin Zheng	d4fc1a70e3	Crash the server correctly during error (#2231 )	2024-11-28 00:22:39 -08:00
Jani Monoses	db674e3d24	Add OLMo2 model. (#2233 )	2024-11-28 00:15:20 -08:00
Lianmin Zheng	fb915bd1a2	Disable overlap scheduler for multimodal models (#2235 )	2024-11-27 23:44:33 -08:00

... 40 41 42 43 44 ...

3455 Commits