sglang

Author	SHA1	Message	Date
Qun Yang	37ee906f61	Add more support for intel Gaudi accelerators (#2357 )	2024-12-06 01:16:33 -08:00
Xiaoyu Zhang	34b364e073	optimize cuda graph max_bs_settings on low-end gpus (#2360 )	2024-12-06 01:13:04 -08:00
Yineng Zhang	84d96b3ae5	Move FP8 to SGLang (#2370 ) Co-authored-by: HaiShaw <hixiao@gmail.com>	2024-12-06 15:42:10 +08:00
xiaobochen	3d32e4a32c	Resubmit MoE-EP (#2371 )	2024-12-06 15:05:21 +08:00
Byron Hsu	64fceab8af	[router] use 2-gpu-runner (#2368 )	2024-12-06 14:13:57 +08:00
Lianmin Zheng	71e2a27753	Fix the cuda graph capture range for small #max-running-requests (#2359 )	2024-12-06 14:13:57 +08:00
Ke Bao	4a63c181f1	Fix AWQ with enable MLA (#2364 )	2024-12-06 00:46:48 +08:00
Lianmin Zheng	2b0fc5941d	[Minor] Code style improvements (#2355 )	2024-12-04 19:02:08 -08:00
Jerry Zhang	9cc733b38c	move apply_torchao_config_ to model_runner (#2342 )	2024-12-04 17:26:42 -08:00
Ke Wen	d693ec0427	Make torch TP composable with torch.compile (#2352 )	2024-12-04 17:26:00 -08:00
Chayenne	18ea841f40	Add Docs For SGLang Native Router (#2308 )	2024-12-04 15:41:22 -08:00
Chayenne	786be44da5	Fix Docs CI When Compile Error (#2323 )	2024-12-04 11:19:46 -08:00
Yineng Zhang	2db4469808	minor: limit the range of vllm versions (#2350 )	2024-12-05 02:00:34 +08:00
Ata Fatahi	ed45e509df	Check gpu availability at server args creation (#2340 ) Signed-off-by: Ata Fatahi <immrata@gmail.com>	2024-12-05 01:53:02 +08:00
Ke Bao	ec52464dde	MLA prefill w/o weight absorption (#2349 )	2024-12-05 01:50:28 +08:00
Yineng Zhang	eb0c1f5373	docs: add SGLang v0.4 blog (#2341 )	2024-12-05 01:24:51 +08:00
HAI	b2986d7aa5	Adding SGLang FP8 Utils (#2348 )	2024-12-04 03:01:33 -08:00
Yineng Zhang	f8b0326934	chore: bump v0.4.0 (#2338 )	2024-12-03 11:55:41 -08:00
Byron Hsu	0495796517	[router] Copy license when publishing & bump version (#2339 )	2024-12-03 10:27:43 -08:00
Lianmin Zheng	1228f7ca69	Fix gptq for moe layers (#2300 ) Co-authored-by: root <me@zhyncs.com>	2024-12-03 23:12:33 +08:00
Yineng Zhang	fda628d8f2	fix: resolve cmake url for Dockerfile.dev (#2335 )	2024-12-03 21:22:19 +08:00
Lianmin Zheng	07ec07ad1f	Improve torch compile for fused moe (#2327 )	2024-12-03 01:58:25 -08:00
Ata Fatahi	83b340e371	Add missing license for router wheel (#2324 ) Signed-off-by: Ata Fatahi <immrata@gmail.com>	2024-12-03 00:06:25 -08:00
HAI	0639bf15d1	ROCm Container: set SGLANG_SET_CPU_AFFINITY=1 (#2328 )	2024-12-02 23:20:33 -08:00
Ying Sheng	aa47f64223	Revert "[feat] Enable chunked prefill for llava-onevision" (#2329 )	2024-12-02 23:11:13 -08:00
Lianmin Zheng	3ddb1c4679	[Minor] Fix logger and style (#2325 )	2024-12-02 20:45:53 -08:00
Ying Sheng	480e38a733	[feat] Enable chunked prefill for llava-onevision (#2281 )	2024-12-02 20:19:02 -08:00
HAI	69e2d4fb66	Relax to include more AMD GPUs (#2319 )	2024-12-02 19:05:58 -08:00
Yineng Zhang	85e1a6f3aa	Update model_loader deps and qqq quantization deps (#2220 ) (#2318 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-02 23:22:13 +08:00
Lianmin Zheng	33deca81b5	Add more fused moe benchmark utilities (#2314 )	2024-12-02 04:26:55 -08:00
Lianmin Zheng	18108abe5d	[Minor] Fix code style (#2311 )	2024-12-02 02:27:36 -08:00
HAI	c54bda300a	Use rocminfo instead of rocm-smi for more OS/WSL support (#2310 )	2024-12-02 00:15:45 -08:00
Lianmin Zheng	3c79ad35ca	[Fix] Fix the padded hash value for image tokens (#2309 )	2024-12-01 23:36:28 -08:00
Chayenne	983bfcf386	Online weight updates from torch.distributed (#2279 )	2024-12-01 23:23:18 -08:00
Yineng Zhang	28bc60dcab	misc: update build setup (#2306 )	2024-12-02 02:03:49 +08:00
Yineng Zhang	7301a39b13	fix: resolve CodeQL cpp issue (#2305 )	2024-12-01 23:55:19 +08:00
Yineng Zhang	47eb139f81	feat: use warp reduce as a simple example (#2304 )	2024-12-01 22:43:50 +08:00
Lianmin Zheng	5c18a03733	Fix logprob for completions (#2301 )	2024-12-01 05:17:05 -08:00
Yineng Zhang	5c91a315d7	feat: support sgl-kernel pypi (#2302 )	2024-12-01 20:11:21 +08:00
Yineng Zhang	3dbd73d319	minor: rm unused _grouped_size_compiled_for_decode_kernels (#2299 )	2024-12-01 19:24:12 +08:00
Yineng Zhang	e9a6203dee	feat: skip good first issue (#2298 )	2024-12-01 19:18:57 +08:00
Qun Yang	62c516ac45	Add a simple torch native attention backend (#2241 )	2024-12-01 03:01:25 -08:00
Yineng Zhang	fc78640e00	minor: support flashinfer nightly (#2295 )	2024-12-01 18:55:26 +08:00
gobraves	906d795f15	Feat: upgrade outlines & support compatibility with the old version (#2292 )	2024-12-01 02:07:27 -08:00
Yineng Zhang	118b6af35e	feat: add should_use_tensor_core (#2179 )	2024-12-01 18:01:16 +08:00
Lianmin Zheng	9449a95431	[CI] Balance CI tests (#2293 )	2024-12-01 01:47:30 -08:00
Liangsheng Yin	5f12f0e7af	Fix chunked prefill when ignore eos (#2290 )	2024-12-01 00:37:53 -08:00
yizhang2077	d5b95cbb53	adapt vllm distributed module to sglang (#2244 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-12-01 15:54:52 +08:00
Lianmin Zheng	0303ca918f	[CI] Fix missing files in run_suite.py (#2288 )	2024-11-30 23:53:34 -08:00
Yineng Zhang	00181098dd	feat: add Dockerfile for development (#2289 )	2024-12-01 15:27:52 +08:00

1 2 3 4 5 ...

1434 Commits