Lianmin Zheng
|
2125898af5
|
Update contributor_guide.md (#2603)
|
2024-12-26 08:36:13 -08:00 |
|
Lianmin Zheng
|
dc3bee4815
|
Fix test and benchmark scripts (#2598)
|
2024-12-26 07:56:26 -08:00 |
|
Lianmin Zheng
|
773951548d
|
Fix logprob_start_len for multi modal models (#2597)
Co-authored-by: libra <lihu723@gmail.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: Wang, Haoyu <haoyu.wang@intel.com>
|
2024-12-26 06:27:45 -08:00 |
|
Yineng Zhang
|
efc52f85e2
|
chore: bump v0.4.1 (#2582)
|
2024-12-26 07:14:51 +08:00 |
|
Shi Shuai
|
25e5d589e3
|
Doc: Update Grammar Backend (#2545)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2024-12-22 17:14:40 -08:00 |
|
Lianmin Zheng
|
8496701934
|
[Misc] Fix metrics, weight update lock, request logging (#2543)
|
2024-12-22 06:27:22 -08:00 |
|
Yineng Zhang
|
8f4d04e540
|
chore: bump v0.4.0.post2 (#2525)
|
2024-12-21 21:16:34 +08:00 |
|
Lianmin Zheng
|
21e9e63ad5
|
Print progress bar during cuda graph capture (#2502)
|
2024-12-17 06:33:46 -08:00 |
|
Ata Fatahi
|
e3b3acfa6f
|
Rename rust folder to sgl-router (#2464)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
|
2024-12-12 09:40:41 -08:00 |
|
Byron Hsu
|
c0ee46fe10
|
[router] Update doc for dynamic scaling and fault tolerance (#2454)
|
2024-12-11 13:11:42 -08:00 |
|
Fred Reiss
|
993956c6b1
|
Add support for IBM Granite 3.x models (#2437)
|
2024-12-11 06:30:23 -08:00 |
|
Adarsh Shirawalmath
|
2b340adfb1
|
Typo fix in router.md (#2424)
|
2024-12-09 21:49:40 -08:00 |
|
SangBin Cho
|
1f09e84b9a
|
nit: Remove busy waiting on scheduler (#2382)
|
2024-12-08 01:06:15 -08:00 |
|
Lianmin Zheng
|
e5f227c0ee
|
Release v0.4.0.post1 (#2375)
|
2024-12-06 06:08:19 -08:00 |
|
Lianmin Zheng
|
0e7409adb6
|
Fix the overlap for xgrammar (#2377)
|
2024-12-06 05:49:29 -08:00 |
|
vchzls
|
3cde5eb629
|
docs: Improve instructions for supporting new models (#2363)
Co-authored-by: zhaohoulong <zhaohoulong@xiaomi.com>
|
2024-12-06 04:27:17 -08:00 |
|
Chayenne
|
18ea841f40
|
Add Docs For SGLang Native Router (#2308)
|
2024-12-04 15:41:22 -08:00 |
|
Chayenne
|
786be44da5
|
Fix Docs CI When Compile Error (#2323)
|
2024-12-04 11:19:46 -08:00 |
|
Yineng Zhang
|
f8b0326934
|
chore: bump v0.4.0 (#2338)
|
2024-12-03 11:55:41 -08:00 |
|
Lianmin Zheng
|
4936be8acc
|
Revert "Revert "[FEAT] Support GGUF format"" (#2287)
|
2024-11-30 22:14:48 -08:00 |
|
Lianmin Zheng
|
7e4c6dd8da
|
Revert "[FEAT] Support GGUF format" (#2285)
|
2024-11-30 19:03:26 -08:00 |
|
Yang Zheng
|
883c955489
|
[FEAT] Support GGUF format (#2215)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
|
2024-11-30 00:44:48 -08:00 |
|
Chayenne
|
7d5d1d3d29
|
udate weights from disk (#2265)
|
2024-11-30 01:17:00 +00:00 |
|
Yineng Zhang
|
fae4e5e99a
|
chore: bump v0.3.6.post3 (#2259)
|
2024-11-30 01:41:16 +08:00 |
|
Lianmin Zheng
|
4f2ee48ed1
|
Update backend.md (#2251)
|
2024-11-28 23:18:07 -08:00 |
|
Lianmin Zheng
|
71ff2728a1
|
Update backend.md (#2250)
|
2024-11-28 23:14:36 -08:00 |
|
HAI
|
b79fffdcb5
|
Update Install Method 2. From source (#2232)
|
2024-11-27 22:46:55 -08:00 |
|
bjmsong
|
91e5dbf554
|
add profile in offline benchmark & update doc (#2123)
Co-authored-by: root <bjmsong@126.com>
|
2024-11-27 14:57:13 -08:00 |
|
Lianmin Zheng
|
fed4c6946a
|
Release v0.3.6.post2 (#2214)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-11-27 03:35:30 -08:00 |
|
Lianmin Zheng
|
ac5a0f0488
|
Release v0.3.6.post1 (#2189)
|
2024-11-25 17:31:37 -08:00 |
|
Rin Intachuen
|
1aea19f64b
|
Input_embeds support (#2052)
|
2024-11-25 16:35:04 -08:00 |
|
Lianmin Zheng
|
8e1adb8441
|
Allow overwrite flashinfer use_tensorcore (#2169)
|
2024-11-24 20:58:17 -08:00 |
|
Lianmin Zheng
|
8912b7637f
|
Fix docs (#2164)
|
2024-11-24 08:25:56 -08:00 |
|
Lianmin Zheng
|
c211e7b669
|
Simplify batch update (#2154)
|
2024-11-24 04:47:10 -08:00 |
|
Henry Hyeonmok Ko
|
dbe1729395
|
Merged three native APIs into one: get_server_info (#2152)
|
2024-11-24 01:37:58 -08:00 |
|
Henry Hyeonmok Ko
|
c35cd1f8c7
|
Expose max total num tokens from Runtime & Engine API (#2092)
|
2024-11-22 15:10:10 -08:00 |
|
Xuehai Pan
|
72f87b723b
|
feat(pre-commit): trim unnecessary notebook metadata from git history (#2127)
|
2024-11-22 13:04:51 -08:00 |
|
Yineng Zhang
|
9a00e6f453
|
chore: bump v0.3.6 (#2120)
|
2024-11-22 19:27:30 +08:00 |
|
Lianmin Zheng
|
dfec7fca06
|
Rename sglang.bench_latency to sglang.bench_one_batch (#2118)
|
2024-11-21 20:07:48 -08:00 |
|
Tanjiro
|
8c280cee55
|
add phi-3 small support (#2062)
Co-authored-by: Tushar Goel <114812108+AI-Tushar@users.noreply.github.com>
|
2024-11-17 18:47:43 -08:00 |
|
Xiaoyu Zhang
|
023d0a73df
|
fix small typos in docs (#2047)
|
2024-11-15 11:09:10 -08:00 |
|
Lianmin Zheng
|
32c9a7ec11
|
Release v0.3.5.post2 (#2046)
|
2024-11-15 06:54:00 -08:00 |
|
ws
|
29ebe3dff4
|
fix: align enable_overlap_scheduler naming between code and docs (#2038)
|
2024-11-15 03:39:10 -08:00 |
|
HAI
|
b275ce0043
|
Github runner instructions for AMD (#2031)
|
2024-11-13 23:57:18 -08:00 |
|
Lianmin Zheng
|
f407fcf9ef
|
Release v0.3.5.post1 (#2022)
|
2024-11-13 10:27:12 -08:00 |
|
RangiLyu
|
f18b9c7252
|
support internlm2-reward (#1994)
|
2024-11-11 15:09:58 -08:00 |
|
Yineng Zhang
|
47ffe7af81
|
docs: add shm size for docker run (#1986)
|
2024-11-10 22:14:48 +08:00 |
|
aqweteddy
|
f16eb15d0d
|
Gemma2 reward model support (#1954)
|
2024-11-07 22:42:27 -08:00 |
|
Yudi Xue
|
5bc2508b80
|
Monitoring documentation (#1933)
|
2024-11-07 22:14:16 -08:00 |
|
Lianmin Zheng
|
a71a44f203
|
Update setup_github_runner.md (#1952)
|
2024-11-07 19:20:47 -08:00 |
|