Commit Graph

1434 Commits

Author SHA1 Message Date
Qun Yang
37ee906f61 Add more support for intel Gaudi accelerators (#2357) 2024-12-06 01:16:33 -08:00
Xiaoyu Zhang
34b364e073 optimize cuda graph max_bs_settings on low-end gpus (#2360) 2024-12-06 01:13:04 -08:00
Yineng Zhang
84d96b3ae5 Move FP8 to SGLang (#2370)
Co-authored-by: HaiShaw <hixiao@gmail.com>
2024-12-06 15:42:10 +08:00
xiaobochen
3d32e4a32c Resubmit MoE-EP (#2371) 2024-12-06 15:05:21 +08:00
Byron Hsu
64fceab8af [router] use 2-gpu-runner (#2368) 2024-12-06 14:13:57 +08:00
Lianmin Zheng
71e2a27753 Fix the cuda graph capture range for small #max-running-requests (#2359) 2024-12-06 14:13:57 +08:00
Ke Bao
4a63c181f1 Fix AWQ with enable MLA (#2364) 2024-12-06 00:46:48 +08:00
Lianmin Zheng
2b0fc5941d [Minor] Code style improvements (#2355) 2024-12-04 19:02:08 -08:00
Jerry Zhang
9cc733b38c move apply_torchao_config_ to model_runner (#2342) 2024-12-04 17:26:42 -08:00
Ke Wen
d693ec0427 Make torch TP composable with torch.compile (#2352) 2024-12-04 17:26:00 -08:00
Chayenne
18ea841f40 Add Docs For SGLang Native Router (#2308) 2024-12-04 15:41:22 -08:00
Chayenne
786be44da5 Fix Docs CI When Compile Error (#2323) 2024-12-04 11:19:46 -08:00
Yineng Zhang
2db4469808 minor: limit the range of vllm versions (#2350) 2024-12-05 02:00:34 +08:00
Ata Fatahi
ed45e509df Check gpu availability at server args creation (#2340)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-05 01:53:02 +08:00
Ke Bao
ec52464dde MLA prefill w/o weight absorption (#2349) 2024-12-05 01:50:28 +08:00
Yineng Zhang
eb0c1f5373 docs: add SGLang v0.4 blog (#2341) 2024-12-05 01:24:51 +08:00
HAI
b2986d7aa5 Adding SGLang FP8 Utils (#2348) 2024-12-04 03:01:33 -08:00
Yineng Zhang
f8b0326934 chore: bump v0.4.0 (#2338) 2024-12-03 11:55:41 -08:00
Byron Hsu
0495796517 [router] Copy license when publishing & bump version (#2339) 2024-12-03 10:27:43 -08:00
Lianmin Zheng
1228f7ca69 Fix gptq for moe layers (#2300)
Co-authored-by: root <me@zhyncs.com>
2024-12-03 23:12:33 +08:00
Yineng Zhang
fda628d8f2 fix: resolve cmake url for Dockerfile.dev (#2335) 2024-12-03 21:22:19 +08:00
Lianmin Zheng
07ec07ad1f Improve torch compile for fused moe (#2327) 2024-12-03 01:58:25 -08:00
Ata Fatahi
83b340e371 Add missing license for router wheel (#2324)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-03 00:06:25 -08:00
HAI
0639bf15d1 ROCm Container: set SGLANG_SET_CPU_AFFINITY=1 (#2328) 2024-12-02 23:20:33 -08:00
Ying Sheng
aa47f64223 Revert "[feat] Enable chunked prefill for llava-onevision" (#2329) 2024-12-02 23:11:13 -08:00
Lianmin Zheng
3ddb1c4679 [Minor] Fix logger and style (#2325) 2024-12-02 20:45:53 -08:00
Ying Sheng
480e38a733 [feat] Enable chunked prefill for llava-onevision (#2281) 2024-12-02 20:19:02 -08:00
HAI
69e2d4fb66 Relax to include more AMD GPUs (#2319) 2024-12-02 19:05:58 -08:00
Yineng Zhang
85e1a6f3aa Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-02 23:22:13 +08:00
Lianmin Zheng
33deca81b5 Add more fused moe benchmark utilities (#2314) 2024-12-02 04:26:55 -08:00
Lianmin Zheng
18108abe5d [Minor] Fix code style (#2311) 2024-12-02 02:27:36 -08:00
HAI
c54bda300a Use rocminfo instead of rocm-smi for more OS/WSL support (#2310) 2024-12-02 00:15:45 -08:00
Lianmin Zheng
3c79ad35ca [Fix] Fix the padded hash value for image tokens (#2309) 2024-12-01 23:36:28 -08:00
Chayenne
983bfcf386 Online weight updates from torch.distributed (#2279) 2024-12-01 23:23:18 -08:00
Yineng Zhang
28bc60dcab misc: update build setup (#2306) 2024-12-02 02:03:49 +08:00
Yineng Zhang
7301a39b13 fix: resolve CodeQL cpp issue (#2305) 2024-12-01 23:55:19 +08:00
Yineng Zhang
47eb139f81 feat: use warp reduce as a simple example (#2304) 2024-12-01 22:43:50 +08:00
Lianmin Zheng
5c18a03733 Fix logprob for completions (#2301) 2024-12-01 05:17:05 -08:00
Yineng Zhang
5c91a315d7 feat: support sgl-kernel pypi (#2302) 2024-12-01 20:11:21 +08:00
Yineng Zhang
3dbd73d319 minor: rm unused _grouped_size_compiled_for_decode_kernels (#2299) 2024-12-01 19:24:12 +08:00
Yineng Zhang
e9a6203dee feat: skip good first issue (#2298) 2024-12-01 19:18:57 +08:00
Qun Yang
62c516ac45 Add a simple torch native attention backend (#2241) 2024-12-01 03:01:25 -08:00
Yineng Zhang
fc78640e00 minor: support flashinfer nightly (#2295) 2024-12-01 18:55:26 +08:00
gobraves
906d795f15 Feat: upgrade outlines & support compatibility with the old version (#2292) 2024-12-01 02:07:27 -08:00
Yineng Zhang
118b6af35e feat: add should_use_tensor_core (#2179) 2024-12-01 18:01:16 +08:00
Lianmin Zheng
9449a95431 [CI] Balance CI tests (#2293) 2024-12-01 01:47:30 -08:00
Liangsheng Yin
5f12f0e7af Fix chunked prefill when ignore eos (#2290) 2024-12-01 00:37:53 -08:00
yizhang2077
d5b95cbb53 adapt vllm distributed module to sglang (#2244)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-12-01 15:54:52 +08:00
Lianmin Zheng
0303ca918f [CI] Fix missing files in run_suite.py (#2288) 2024-11-30 23:53:34 -08:00
Yineng Zhang
00181098dd feat: add Dockerfile for development (#2289) 2024-12-01 15:27:52 +08:00