Ke Bao
|
e835a50021
|
Reorg moe code (#2563)
|
2024-12-24 01:10:22 +08:00 |
|
Lianmin Zheng
|
8496701934
|
[Misc] Fix metrics, weight update lock, request logging (#2543)
|
2024-12-22 06:27:22 -08:00 |
|
Lianmin Zheng
|
9cd9dc83b3
|
Temporarily disable unit test of torch native attention backend (#2492)
|
2024-12-16 14:17:27 -08:00 |
|
Ke Bao
|
2f9bd0fafd
|
Fix correctness issue for triton decoding kernel (#2479)
|
2024-12-14 16:50:54 +08:00 |
|
Fred Reiss
|
993956c6b1
|
Add support for IBM Granite 3.x models (#2437)
|
2024-12-11 06:30:23 -08:00 |
|
Ying Sheng
|
8586b72da0
|
[feat] Enable chunked prefill for llava-onevision (#2412)
|
2024-12-09 09:52:38 -08:00 |
|
Lianmin Zheng
|
641b7d0ae0
|
[Minor] Improve code style (#2422)
|
2024-12-09 06:30:35 -08:00 |
|
Xiaoyu Zhang
|
3844feb9bb
|
Add a unittest for fused_moe (#2416)
|
2024-12-08 22:46:10 -08:00 |
|
Lianmin Zheng
|
a6ca736c8e
|
Simplify stream_output (#2398)
|
2024-12-08 12:27:13 -08:00 |
|
Yineng Zhang
|
f62055b528
|
minor: add random flashinfer vs triton use case (#2409)
|
2024-12-09 04:15:21 +08:00 |
|
Yineng Zhang
|
74bc9184c3
|
minor: add random use case (#2408)
|
2024-12-09 03:21:35 +08:00 |
|
Yineng Zhang
|
0f8eb15323
|
feat: support custom task runner (#2407)
|
2024-12-09 02:29:55 +08:00 |
|
Yineng Zhang
|
67470bbb28
|
minor: update correct measurement unit (#2406)
|
2024-12-08 20:55:04 +08:00 |
|
Ke Bao
|
61dec545b0
|
Remove unused vars in the triton backend (#2401)
|
2024-12-08 03:37:03 -08:00 |
|
Ke Bao
|
7dc66fcb40
|
Optimize Triton decoding kernel for long context (#2394)
|
2024-12-08 01:17:37 -08:00 |
|
Lianmin Zheng
|
0e7409adb6
|
Fix the overlap for xgrammar (#2377)
|
2024-12-06 05:49:29 -08:00 |
|
xiaobochen
|
3d32e4a32c
|
Resubmit MoE-EP (#2371)
|
2024-12-06 15:05:21 +08:00 |
|
Lianmin Zheng
|
07ec07ad1f
|
Improve torch compile for fused moe (#2327)
|
2024-12-03 01:58:25 -08:00 |
|
Ying Sheng
|
aa47f64223
|
Revert "[feat] Enable chunked prefill for llava-onevision" (#2329)
|
2024-12-02 23:11:13 -08:00 |
|
Ying Sheng
|
480e38a733
|
[feat] Enable chunked prefill for llava-onevision (#2281)
|
2024-12-02 20:19:02 -08:00 |
|
Lianmin Zheng
|
18108abe5d
|
[Minor] Fix code style (#2311)
|
2024-12-02 02:27:36 -08:00 |
|
Chayenne
|
983bfcf386
|
Online weight updates from torch.distributed (#2279)
|
2024-12-01 23:23:18 -08:00 |
|
Qun Yang
|
62c516ac45
|
Add a simple torch native attention backend (#2241)
|
2024-12-01 03:01:25 -08:00 |
|
Lianmin Zheng
|
9449a95431
|
[CI] Balance CI tests (#2293)
|
2024-12-01 01:47:30 -08:00 |
|
Lianmin Zheng
|
0303ca918f
|
[CI] Fix missing files in run_suite.py (#2288)
|
2024-11-30 23:53:34 -08:00 |
|
Lianmin Zheng
|
4936be8acc
|
Revert "Revert "[FEAT] Support GGUF format"" (#2287)
|
2024-11-30 22:14:48 -08:00 |
|
Lianmin Zheng
|
1bfa511b95
|
[CI] Fix ci tests (#2284)
|
2024-11-30 21:12:03 -08:00 |
|
Lianmin Zheng
|
7e4c6dd8da
|
Revert "[FEAT] Support GGUF format" (#2285)
|
2024-11-30 19:03:26 -08:00 |
|
Yang Zheng
|
883c955489
|
[FEAT] Support GGUF format (#2215)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
|
2024-11-30 00:44:48 -08:00 |
|
Lianmin Zheng
|
ccaf1f997c
|
[CI] Print summary on github actions (#2274)
|
2024-11-29 23:48:54 -08:00 |
|
Chayenne
|
7d1485d376
|
Add get weights by parameter name for llama (#2266)
|
2024-11-29 23:36:38 -08:00 |
|
Chayenne
|
7d5d1d3d29
|
udate weights from disk (#2265)
|
2024-11-30 01:17:00 +00:00 |
|
Lianmin Zheng
|
fe97a2d40f
|
Simplify tokenizer manager (#2254)
|
2024-11-29 02:18:51 -08:00 |
|
Ying Sheng
|
8b48496aaf
|
Revert "Revert "Add simple CPU offloading support"" (#2253)
Co-authored-by: Jani Monoses <jani.monoses@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-28 23:58:54 -08:00 |
|
Ying Sheng
|
4057ea82c9
|
Revert "Add simple CPU offloading support" (#2252)
We'll re-add the commit to correctly ack Kaichao's authorship
|
2024-11-28 23:36:55 -08:00 |
|
Ying Sheng
|
b7038fec9b
|
[fix] Fix prefix caching for multi-image/video (#2239)
|
2024-11-28 12:08:13 -08:00 |
|
Lianmin Zheng
|
b2ccf36d4d
|
Fix memory leak during abort (#2238)
|
2024-11-28 02:22:15 -08:00 |
|
Lianmin Zheng
|
d4fc1a70e3
|
Crash the server correctly during error (#2231)
|
2024-11-28 00:22:39 -08:00 |
|
Jani Monoses
|
db674e3d24
|
Add OLMo2 model. (#2233)
|
2024-11-28 00:15:20 -08:00 |
|
Lianmin Zheng
|
fed4c6946a
|
Release v0.3.6.post2 (#2214)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-11-27 03:35:30 -08:00 |
|
Ying Sheng
|
37c8a5761f
|
[feat] Support session control for vision language models (#2210)
|
2024-11-27 00:03:29 -08:00 |
|
Lianmin Zheng
|
c754652fcd
|
Fix flasky tests (#2212)
|
2024-11-26 23:06:20 -08:00 |
|
Yudi Xue
|
19f33b3237
|
add sglang version to get_server_info (#2206)
|
2024-11-26 12:10:23 -08:00 |
|
Lianmin Zheng
|
ea34350d88
|
Rename double sparsity config file (#2188)
|
2024-11-25 17:12:08 -08:00 |
|
Lianmin Zheng
|
1605ae121e
|
[CI] Minor fix for CI (#2187)
|
2024-11-25 16:38:43 -08:00 |
|
Rin Intachuen
|
1aea19f64b
|
Input_embeds support (#2052)
|
2024-11-25 16:35:04 -08:00 |
|
Yixin Dong
|
7f076c2ce6
|
Update XGrammar to the latest API (#2176)
Co-authored-by: Ben Gitter <gitterbd@gmail.com>
|
2024-11-25 15:58:30 -08:00 |
|
Lianmin Zheng
|
3c5538f781
|
Update CI threshold (#2186)
|
2024-11-25 15:24:17 -08:00 |
|
Ying Sheng
|
e1e595d702
|
[feat] Refactor session control interface and add CI (#2173)
|
2024-11-25 12:32:51 -08:00 |
|
Lianmin Zheng
|
254fd130e2
|
[CI] Split test cases in CI for better load balancing (#2180)
|
2024-11-25 04:58:16 -08:00 |
|