Commit Graph

173 Commits

Author SHA1 Message Date
Yineng Zhang
efc52f85e2 chore: bump v0.4.1 (#2582) 2024-12-26 07:14:51 +08:00
Shi Shuai
25e5d589e3 Doc: Update Grammar Backend (#2545)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-22 17:14:40 -08:00
Lianmin Zheng
8496701934 [Misc] Fix metrics, weight update lock, request logging (#2543) 2024-12-22 06:27:22 -08:00
Yineng Zhang
8f4d04e540 chore: bump v0.4.0.post2 (#2525) 2024-12-21 21:16:34 +08:00
Lianmin Zheng
21e9e63ad5 Print progress bar during cuda graph capture (#2502) 2024-12-17 06:33:46 -08:00
Ata Fatahi
e3b3acfa6f Rename rust folder to sgl-router (#2464)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-12 09:40:41 -08:00
Byron Hsu
c0ee46fe10 [router] Update doc for dynamic scaling and fault tolerance (#2454) 2024-12-11 13:11:42 -08:00
Fred Reiss
993956c6b1 Add support for IBM Granite 3.x models (#2437) 2024-12-11 06:30:23 -08:00
Adarsh Shirawalmath
2b340adfb1 Typo fix in router.md (#2424) 2024-12-09 21:49:40 -08:00
SangBin Cho
1f09e84b9a nit: Remove busy waiting on scheduler (#2382) 2024-12-08 01:06:15 -08:00
Lianmin Zheng
e5f227c0ee Release v0.4.0.post1 (#2375) 2024-12-06 06:08:19 -08:00
Lianmin Zheng
0e7409adb6 Fix the overlap for xgrammar (#2377) 2024-12-06 05:49:29 -08:00
vchzls
3cde5eb629 docs: Improve instructions for supporting new models (#2363)
Co-authored-by: zhaohoulong <zhaohoulong@xiaomi.com>
2024-12-06 04:27:17 -08:00
Chayenne
18ea841f40 Add Docs For SGLang Native Router (#2308) 2024-12-04 15:41:22 -08:00
Chayenne
786be44da5 Fix Docs CI When Compile Error (#2323) 2024-12-04 11:19:46 -08:00
Yineng Zhang
f8b0326934 chore: bump v0.4.0 (#2338) 2024-12-03 11:55:41 -08:00
Lianmin Zheng
4936be8acc Revert "Revert "[FEAT] Support GGUF format"" (#2287) 2024-11-30 22:14:48 -08:00
Lianmin Zheng
7e4c6dd8da Revert "[FEAT] Support GGUF format" (#2285) 2024-11-30 19:03:26 -08:00
Yang Zheng
883c955489 [FEAT] Support GGUF format (#2215)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
2024-11-30 00:44:48 -08:00
Chayenne
7d5d1d3d29 udate weights from disk (#2265) 2024-11-30 01:17:00 +00:00
Yineng Zhang
fae4e5e99a chore: bump v0.3.6.post3 (#2259) 2024-11-30 01:41:16 +08:00
Lianmin Zheng
4f2ee48ed1 Update backend.md (#2251) 2024-11-28 23:18:07 -08:00
Lianmin Zheng
71ff2728a1 Update backend.md (#2250) 2024-11-28 23:14:36 -08:00
HAI
b79fffdcb5 Update Install Method 2. From source (#2232) 2024-11-27 22:46:55 -08:00
bjmsong
91e5dbf554 add profile in offline benchmark & update doc (#2123)
Co-authored-by: root <bjmsong@126.com>
2024-11-27 14:57:13 -08:00
Lianmin Zheng
fed4c6946a Release v0.3.6.post2 (#2214)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-11-27 03:35:30 -08:00
Lianmin Zheng
ac5a0f0488 Release v0.3.6.post1 (#2189) 2024-11-25 17:31:37 -08:00
Rin Intachuen
1aea19f64b Input_embeds support (#2052) 2024-11-25 16:35:04 -08:00
Lianmin Zheng
8e1adb8441 Allow overwrite flashinfer use_tensorcore (#2169) 2024-11-24 20:58:17 -08:00
Lianmin Zheng
8912b7637f Fix docs (#2164) 2024-11-24 08:25:56 -08:00
Lianmin Zheng
c211e7b669 Simplify batch update (#2154) 2024-11-24 04:47:10 -08:00
Henry Hyeonmok Ko
dbe1729395 Merged three native APIs into one: get_server_info (#2152) 2024-11-24 01:37:58 -08:00
Henry Hyeonmok Ko
c35cd1f8c7 Expose max total num tokens from Runtime & Engine API (#2092) 2024-11-22 15:10:10 -08:00
Xuehai Pan
72f87b723b feat(pre-commit): trim unnecessary notebook metadata from git history (#2127) 2024-11-22 13:04:51 -08:00
Yineng Zhang
9a00e6f453 chore: bump v0.3.6 (#2120) 2024-11-22 19:27:30 +08:00
Lianmin Zheng
dfec7fca06 Rename sglang.bench_latency to sglang.bench_one_batch (#2118) 2024-11-21 20:07:48 -08:00
Tanjiro
8c280cee55 add phi-3 small support (#2062)
Co-authored-by: Tushar Goel <114812108+AI-Tushar@users.noreply.github.com>
2024-11-17 18:47:43 -08:00
Xiaoyu Zhang
023d0a73df fix small typos in docs (#2047) 2024-11-15 11:09:10 -08:00
Lianmin Zheng
32c9a7ec11 Release v0.3.5.post2 (#2046) 2024-11-15 06:54:00 -08:00
ws
29ebe3dff4 fix: align enable_overlap_scheduler naming between code and docs (#2038) 2024-11-15 03:39:10 -08:00
HAI
b275ce0043 Github runner instructions for AMD (#2031) 2024-11-13 23:57:18 -08:00
Lianmin Zheng
f407fcf9ef Release v0.3.5.post1 (#2022) 2024-11-13 10:27:12 -08:00
RangiLyu
f18b9c7252 support internlm2-reward (#1994) 2024-11-11 15:09:58 -08:00
Yineng Zhang
47ffe7af81 docs: add shm size for docker run (#1986) 2024-11-10 22:14:48 +08:00
aqweteddy
f16eb15d0d Gemma2 reward model support (#1954) 2024-11-07 22:42:27 -08:00
Yudi Xue
5bc2508b80 Monitoring documentation (#1933) 2024-11-07 22:14:16 -08:00
Lianmin Zheng
a71a44f203 Update setup_github_runner.md (#1952) 2024-11-07 19:20:47 -08:00
Lianmin Zheng
1ae270c5d0 [Doc] fix docs (#1949) 2024-11-07 18:20:41 -08:00
Chayenne
c77c1e05ba fix black in pre-commit (#1940) 2024-11-08 07:42:47 +08:00
Xuehai Pan
a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) 2024-11-06 13:46:04 +00:00