sglang

Author	SHA1	Message	Date
Yineng Zhang	f8b0326934	chore: bump v0.4.0 (#2338 )	2024-12-03 11:55:41 -08:00
Lianmin Zheng	1228f7ca69	Fix gptq for moe layers (#2300 ) Co-authored-by: root <me@zhyncs.com>	2024-12-03 23:12:33 +08:00
Lianmin Zheng	07ec07ad1f	Improve torch compile for fused moe (#2327 )	2024-12-03 01:58:25 -08:00
Ying Sheng	aa47f64223	Revert "[feat] Enable chunked prefill for llava-onevision" (#2329 )	2024-12-02 23:11:13 -08:00
Lianmin Zheng	3ddb1c4679	[Minor] Fix logger and style (#2325 )	2024-12-02 20:45:53 -08:00
Ying Sheng	480e38a733	[feat] Enable chunked prefill for llava-onevision (#2281 )	2024-12-02 20:19:02 -08:00
HAI	69e2d4fb66	Relax to include more AMD GPUs (#2319 )	2024-12-02 19:05:58 -08:00
Yineng Zhang	85e1a6f3aa	Update model_loader deps and qqq quantization deps (#2220 ) (#2318 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-02 23:22:13 +08:00
Lianmin Zheng	18108abe5d	[Minor] Fix code style (#2311 )	2024-12-02 02:27:36 -08:00
HAI	c54bda300a	Use rocminfo instead of rocm-smi for more OS/WSL support (#2310 )	2024-12-02 00:15:45 -08:00
Lianmin Zheng	3c79ad35ca	[Fix] Fix the padded hash value for image tokens (#2309 )	2024-12-01 23:36:28 -08:00
Chayenne	983bfcf386	Online weight updates from torch.distributed (#2279 )	2024-12-01 23:23:18 -08:00
Lianmin Zheng	5c18a03733	Fix logprob for completions (#2301 )	2024-12-01 05:17:05 -08:00
Qun Yang	62c516ac45	Add a simple torch native attention backend (#2241 )	2024-12-01 03:01:25 -08:00
Yineng Zhang	fc78640e00	minor: support flashinfer nightly (#2295 )	2024-12-01 18:55:26 +08:00
gobraves	906d795f15	Feat: upgrade outlines & support compatibility with the old version (#2292 )	2024-12-01 02:07:27 -08:00
Yineng Zhang	118b6af35e	feat: add should_use_tensor_core (#2179 )	2024-12-01 18:01:16 +08:00
Liangsheng Yin	5f12f0e7af	Fix chunked prefill when ignore eos (#2290 )	2024-12-01 00:37:53 -08:00
yizhang2077	d5b95cbb53	adapt vllm distributed module to sglang (#2244 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-12-01 15:54:52 +08:00
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	f5b5f2bff9	Revert "[Fix] fix assertion error for chunked prefill when disabling cache" (#2286 )	2024-11-30 19:03:42 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Rui Wang	d622851dc9	[Fix] fix assertion error for chunked prefill when disabling cache (#2282 )	2024-11-30 17:53:43 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Lianmin Zheng	ccaf1f997c	[CI] Print summary on github actions (#2274 )	2024-11-29 23:48:54 -08:00
Chayenne	7d1485d376	Add get weights by parameter name for llama (#2266 )	2024-11-29 23:36:38 -08:00
Chayenne	7d5d1d3d29	udate weights from disk (#2265 )	2024-11-30 01:17:00 +00:00
bjmsong	01017d4c20	Support LoRA in Completion API (#2243 ) Co-authored-by: root <bjmsong@126.com>	2024-11-29 16:13:38 -08:00
Lianmin Zheng	94e167ea5a	Fix the default chunked prefill size (#2268 )	2024-11-29 16:03:32 -08:00
Xiaoyu Zhang	262e370f78	[benchmark] Add fused_moe_triton benchmark and tuning tools (#2225 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: HAI <hixiao@gmail.com>	2024-11-29 13:36:45 -08:00
Yineng Zhang	fae4e5e99a	chore: bump v0.3.6.post3 (#2259 )	2024-11-30 01:41:16 +08:00
Lianmin Zheng	afe1e46586	[Minor] fix the style for multimodal models (#2257 )	2024-11-29 04:24:20 -08:00
Lianmin Zheng	f50a6cf443	Fix hash collision for multi modal models (#2256 )	2024-11-29 03:15:58 -08:00
Lianmin Zheng	fe97a2d40f	Simplify tokenizer manager (#2254 )	2024-11-29 02:18:51 -08:00
Ying Sheng	8b48496aaf	Revert "Revert "Add simple CPU offloading support"" (#2253 ) Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-28 23:58:54 -08:00
Ying Sheng	4057ea82c9	Revert "Add simple CPU offloading support" (#2252 ) We'll re-add the commit to correctly ack Kaichao's authorship	2024-11-28 23:36:55 -08:00
Ying Sheng	b7038fec9b	[fix] Fix prefix caching for multi-image/video (#2239 )	2024-11-28 12:08:13 -08:00
Enrique Shockwave	65fdb28929	fix missing launch server import (#2242 )	2024-11-28 21:24:47 +08:00
Lianmin Zheng	b2ccf36d4d	Fix memory leak during abort (#2238 )	2024-11-28 02:22:15 -08:00
Lianmin Zheng	d4fc1a70e3	Crash the server correctly during error (#2231 )	2024-11-28 00:22:39 -08:00
Jani Monoses	db674e3d24	Add OLMo2 model. (#2233 )	2024-11-28 00:15:20 -08:00
Lianmin Zheng	fb915bd1a2	Disable overlap scheduler for multimodal models (#2235 )	2024-11-27 23:44:33 -08:00
Lianmin Zheng	09798b36cd	Fix chunked prefill size for bench_offline_throughput (#2234 )	2024-11-27 23:37:20 -08:00
HAI	cd51758fad	Rename tuned MI300X config files for fused_moe_triton (#2228 )	2024-11-27 21:18:51 -08:00
bjmsong	91e5dbf554	add profile in offline benchmark & update doc (#2123 ) Co-authored-by: root <bjmsong@126.com>	2024-11-27 14:57:13 -08:00
Lianmin Zheng	dd5eba4c88	Remove fused_moe_grok (#2223 )	2024-11-27 14:28:55 -08:00
Baoyuan Qi	a4fd2f9b46	fix typo prompts (#2224 )	2024-11-27 12:07:00 -08:00
Lianmin Zheng	2a02185c5f	Rename DP_RANK to SGLANG_DP_RANK (#2218 )	2024-11-27 09:36:36 -08:00
Lianmin Zheng	fed4c6946a	Release v0.3.6.post2 (#2214 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-11-27 03:35:30 -08:00
Lianmin Zheng	fb6e04a0c2	Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2222 )	2024-11-27 02:52:46 -08:00

1 2 3 4 5 ...

1084 Commits