Qun Yang
|
37ee906f61
|
Add more support for intel Gaudi accelerators (#2357)
|
2024-12-06 01:16:33 -08:00 |
|
Xiaoyu Zhang
|
34b364e073
|
optimize cuda graph max_bs_settings on low-end gpus (#2360)
|
2024-12-06 01:13:04 -08:00 |
|
Yineng Zhang
|
84d96b3ae5
|
Move FP8 to SGLang (#2370)
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2024-12-06 15:42:10 +08:00 |
|
xiaobochen
|
3d32e4a32c
|
Resubmit MoE-EP (#2371)
|
2024-12-06 15:05:21 +08:00 |
|
Byron Hsu
|
64fceab8af
|
[router] use 2-gpu-runner (#2368)
|
2024-12-06 14:13:57 +08:00 |
|
Lianmin Zheng
|
71e2a27753
|
Fix the cuda graph capture range for small #max-running-requests (#2359)
|
2024-12-06 14:13:57 +08:00 |
|
Ke Bao
|
4a63c181f1
|
Fix AWQ with enable MLA (#2364)
|
2024-12-06 00:46:48 +08:00 |
|
Lianmin Zheng
|
2b0fc5941d
|
[Minor] Code style improvements (#2355)
|
2024-12-04 19:02:08 -08:00 |
|
Jerry Zhang
|
9cc733b38c
|
move apply_torchao_config_ to model_runner (#2342)
|
2024-12-04 17:26:42 -08:00 |
|
Ke Wen
|
d693ec0427
|
Make torch TP composable with torch.compile (#2352)
|
2024-12-04 17:26:00 -08:00 |
|
Chayenne
|
18ea841f40
|
Add Docs For SGLang Native Router (#2308)
|
2024-12-04 15:41:22 -08:00 |
|
Chayenne
|
786be44da5
|
Fix Docs CI When Compile Error (#2323)
|
2024-12-04 11:19:46 -08:00 |
|
Yineng Zhang
|
2db4469808
|
minor: limit the range of vllm versions (#2350)
|
2024-12-05 02:00:34 +08:00 |
|
Ata Fatahi
|
ed45e509df
|
Check gpu availability at server args creation (#2340)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
|
2024-12-05 01:53:02 +08:00 |
|
Ke Bao
|
ec52464dde
|
MLA prefill w/o weight absorption (#2349)
|
2024-12-05 01:50:28 +08:00 |
|
Yineng Zhang
|
eb0c1f5373
|
docs: add SGLang v0.4 blog (#2341)
|
2024-12-05 01:24:51 +08:00 |
|
HAI
|
b2986d7aa5
|
Adding SGLang FP8 Utils (#2348)
|
2024-12-04 03:01:33 -08:00 |
|
Yineng Zhang
|
f8b0326934
|
chore: bump v0.4.0 (#2338)
|
2024-12-03 11:55:41 -08:00 |
|
Byron Hsu
|
0495796517
|
[router] Copy license when publishing & bump version (#2339)
|
2024-12-03 10:27:43 -08:00 |
|
Lianmin Zheng
|
1228f7ca69
|
Fix gptq for moe layers (#2300)
Co-authored-by: root <me@zhyncs.com>
|
2024-12-03 23:12:33 +08:00 |
|
Yineng Zhang
|
fda628d8f2
|
fix: resolve cmake url for Dockerfile.dev (#2335)
|
2024-12-03 21:22:19 +08:00 |
|
Lianmin Zheng
|
07ec07ad1f
|
Improve torch compile for fused moe (#2327)
|
2024-12-03 01:58:25 -08:00 |
|
Ata Fatahi
|
83b340e371
|
Add missing license for router wheel (#2324)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
|
2024-12-03 00:06:25 -08:00 |
|
HAI
|
0639bf15d1
|
ROCm Container: set SGLANG_SET_CPU_AFFINITY=1 (#2328)
|
2024-12-02 23:20:33 -08:00 |
|
Ying Sheng
|
aa47f64223
|
Revert "[feat] Enable chunked prefill for llava-onevision" (#2329)
|
2024-12-02 23:11:13 -08:00 |
|
Lianmin Zheng
|
3ddb1c4679
|
[Minor] Fix logger and style (#2325)
|
2024-12-02 20:45:53 -08:00 |
|
Ying Sheng
|
480e38a733
|
[feat] Enable chunked prefill for llava-onevision (#2281)
|
2024-12-02 20:19:02 -08:00 |
|
HAI
|
69e2d4fb66
|
Relax to include more AMD GPUs (#2319)
|
2024-12-02 19:05:58 -08:00 |
|
Yineng Zhang
|
85e1a6f3aa
|
Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-02 23:22:13 +08:00 |
|
Lianmin Zheng
|
33deca81b5
|
Add more fused moe benchmark utilities (#2314)
|
2024-12-02 04:26:55 -08:00 |
|
Lianmin Zheng
|
18108abe5d
|
[Minor] Fix code style (#2311)
|
2024-12-02 02:27:36 -08:00 |
|
HAI
|
c54bda300a
|
Use rocminfo instead of rocm-smi for more OS/WSL support (#2310)
|
2024-12-02 00:15:45 -08:00 |
|
Lianmin Zheng
|
3c79ad35ca
|
[Fix] Fix the padded hash value for image tokens (#2309)
|
2024-12-01 23:36:28 -08:00 |
|
Chayenne
|
983bfcf386
|
Online weight updates from torch.distributed (#2279)
|
2024-12-01 23:23:18 -08:00 |
|
Yineng Zhang
|
28bc60dcab
|
misc: update build setup (#2306)
|
2024-12-02 02:03:49 +08:00 |
|
Yineng Zhang
|
7301a39b13
|
fix: resolve CodeQL cpp issue (#2305)
|
2024-12-01 23:55:19 +08:00 |
|
Yineng Zhang
|
47eb139f81
|
feat: use warp reduce as a simple example (#2304)
|
2024-12-01 22:43:50 +08:00 |
|
Lianmin Zheng
|
5c18a03733
|
Fix logprob for completions (#2301)
|
2024-12-01 05:17:05 -08:00 |
|
Yineng Zhang
|
5c91a315d7
|
feat: support sgl-kernel pypi (#2302)
|
2024-12-01 20:11:21 +08:00 |
|
Yineng Zhang
|
3dbd73d319
|
minor: rm unused _grouped_size_compiled_for_decode_kernels (#2299)
|
2024-12-01 19:24:12 +08:00 |
|
Yineng Zhang
|
e9a6203dee
|
feat: skip good first issue (#2298)
|
2024-12-01 19:18:57 +08:00 |
|
Qun Yang
|
62c516ac45
|
Add a simple torch native attention backend (#2241)
|
2024-12-01 03:01:25 -08:00 |
|
Yineng Zhang
|
fc78640e00
|
minor: support flashinfer nightly (#2295)
|
2024-12-01 18:55:26 +08:00 |
|
gobraves
|
906d795f15
|
Feat: upgrade outlines & support compatibility with the old version (#2292)
|
2024-12-01 02:07:27 -08:00 |
|
Yineng Zhang
|
118b6af35e
|
feat: add should_use_tensor_core (#2179)
|
2024-12-01 18:01:16 +08:00 |
|
Lianmin Zheng
|
9449a95431
|
[CI] Balance CI tests (#2293)
|
2024-12-01 01:47:30 -08:00 |
|
Liangsheng Yin
|
5f12f0e7af
|
Fix chunked prefill when ignore eos (#2290)
|
2024-12-01 00:37:53 -08:00 |
|
yizhang2077
|
d5b95cbb53
|
adapt vllm distributed module to sglang (#2244)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-12-01 15:54:52 +08:00 |
|
Lianmin Zheng
|
0303ca918f
|
[CI] Fix missing files in run_suite.py (#2288)
|
2024-11-30 23:53:34 -08:00 |
|
Yineng Zhang
|
00181098dd
|
feat: add Dockerfile for development (#2289)
|
2024-12-01 15:27:52 +08:00 |
|