Commit Graph

194 Commits

Author SHA1 Message Date
Lianmin Zheng
0f9cc6d8d3 Fix package loss for small models (#2717)
Co-authored-by: sdli1995 < mmlmonkey@163.com>
2025-01-02 18:25:26 -08:00
Shi Shuai
dd2e2d275f Docs: Update documentation workflow and contribution guide (#2704)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-01-02 09:18:31 -08:00
Shi Shuai
062c48d2bd [Docs] Add Support for Pydantic Structured Output Format (#2697) 2025-01-01 15:08:43 -08:00
Chayenne
0d8d97b8e6 Doc: Rename contribution_guide.md (#2691) 2024-12-31 14:35:48 -08:00
Shi Shuai
0a765bbccc Docs: Refactor Contribution Guide (#2690) 2024-12-31 14:11:00 -08:00
Yineng Zhang
d49b13c6f8 feat: use CUDA 12.4 by default (for FA3) (#2682) 2024-12-31 15:52:09 +08:00
Lianmin Zheng
bdd2827a80 Update structured_outputs.ipynb (#2666) 2024-12-30 00:46:41 -08:00
Lianmin Zheng
8c3b420eec [Docs] clean up structured outputs docs (#2654) 2024-12-29 23:57:16 -08:00
Yineng Zhang
098d659c0e docs: update README (#2651) 2024-12-30 13:33:29 +08:00
Lianmin Zheng
03d5fbfd44 Release 0.4.1.post3 - upload the config.json to PyPI (#2647) 2024-12-29 14:25:53 -08:00
Yineng Zhang
b085e06b01 docs: add development guide using docker (#2645) 2024-12-30 02:22:54 +08:00
Yineng Zhang
3ccf566b0d chore: bump v0.4.1.post2 (#2643) 2024-12-30 00:11:46 +08:00
Adarsh Shirawalmath
fd34f2da35 [Docs] Add EBNF to sampling params docs (#2609) 2024-12-29 00:05:00 -08:00
Tanjiro
8ee9a8501a [Feature] Function Calling (#2544)
Co-authored-by: Haoyu Wang <120358163+HaoyuWang4188@users.noreply.github.com>
2024-12-28 21:58:52 -08:00
Shi Shuai
333e3bfde5 [docs]Refactor constrained decoding tutorial (#2633) 2024-12-28 07:00:38 -08:00
Shi Shuai
239c9d4d3a Docs: Add constrained decoding tutorial (#2614)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-27 23:54:28 -08:00
Lianmin Zheng
751e5ca273 [minor] clean up docs and eos id (#2622) 2024-12-27 11:23:46 -08:00
Yineng Zhang
ef5b0ff90b chore: bump v0.4.1.post1 (#2616) 2024-12-28 00:11:06 +08:00
Lianmin Zheng
2125898af5 Update contributor_guide.md (#2603) 2024-12-26 08:36:13 -08:00
Lianmin Zheng
dc3bee4815 Fix test and benchmark scripts (#2598) 2024-12-26 07:56:26 -08:00
Lianmin Zheng
773951548d Fix logprob_start_len for multi modal models (#2597)
Co-authored-by: libra <lihu723@gmail.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: Wang, Haoyu <haoyu.wang@intel.com>
2024-12-26 06:27:45 -08:00
Yineng Zhang
efc52f85e2 chore: bump v0.4.1 (#2582) 2024-12-26 07:14:51 +08:00
Shi Shuai
25e5d589e3 Doc: Update Grammar Backend (#2545)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2024-12-22 17:14:40 -08:00
Lianmin Zheng
8496701934 [Misc] Fix metrics, weight update lock, request logging (#2543) 2024-12-22 06:27:22 -08:00
Yineng Zhang
8f4d04e540 chore: bump v0.4.0.post2 (#2525) 2024-12-21 21:16:34 +08:00
Lianmin Zheng
21e9e63ad5 Print progress bar during cuda graph capture (#2502) 2024-12-17 06:33:46 -08:00
Ata Fatahi
e3b3acfa6f Rename rust folder to sgl-router (#2464)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-12 09:40:41 -08:00
Byron Hsu
c0ee46fe10 [router] Update doc for dynamic scaling and fault tolerance (#2454) 2024-12-11 13:11:42 -08:00
Fred Reiss
993956c6b1 Add support for IBM Granite 3.x models (#2437) 2024-12-11 06:30:23 -08:00
Adarsh Shirawalmath
2b340adfb1 Typo fix in router.md (#2424) 2024-12-09 21:49:40 -08:00
SangBin Cho
1f09e84b9a nit: Remove busy waiting on scheduler (#2382) 2024-12-08 01:06:15 -08:00
Lianmin Zheng
e5f227c0ee Release v0.4.0.post1 (#2375) 2024-12-06 06:08:19 -08:00
Lianmin Zheng
0e7409adb6 Fix the overlap for xgrammar (#2377) 2024-12-06 05:49:29 -08:00
vchzls
3cde5eb629 docs: Improve instructions for supporting new models (#2363)
Co-authored-by: zhaohoulong <zhaohoulong@xiaomi.com>
2024-12-06 04:27:17 -08:00
Chayenne
18ea841f40 Add Docs For SGLang Native Router (#2308) 2024-12-04 15:41:22 -08:00
Chayenne
786be44da5 Fix Docs CI When Compile Error (#2323) 2024-12-04 11:19:46 -08:00
Yineng Zhang
f8b0326934 chore: bump v0.4.0 (#2338) 2024-12-03 11:55:41 -08:00
Lianmin Zheng
4936be8acc Revert "Revert "[FEAT] Support GGUF format"" (#2287) 2024-11-30 22:14:48 -08:00
Lianmin Zheng
7e4c6dd8da Revert "[FEAT] Support GGUF format" (#2285) 2024-11-30 19:03:26 -08:00
Yang Zheng
883c955489 [FEAT] Support GGUF format (#2215)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
2024-11-30 00:44:48 -08:00
Chayenne
7d5d1d3d29 udate weights from disk (#2265) 2024-11-30 01:17:00 +00:00
Yineng Zhang
fae4e5e99a chore: bump v0.3.6.post3 (#2259) 2024-11-30 01:41:16 +08:00
Lianmin Zheng
4f2ee48ed1 Update backend.md (#2251) 2024-11-28 23:18:07 -08:00
Lianmin Zheng
71ff2728a1 Update backend.md (#2250) 2024-11-28 23:14:36 -08:00
HAI
b79fffdcb5 Update Install Method 2. From source (#2232) 2024-11-27 22:46:55 -08:00
bjmsong
91e5dbf554 add profile in offline benchmark & update doc (#2123)
Co-authored-by: root <bjmsong@126.com>
2024-11-27 14:57:13 -08:00
Lianmin Zheng
fed4c6946a Release v0.3.6.post2 (#2214)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-11-27 03:35:30 -08:00
Lianmin Zheng
ac5a0f0488 Release v0.3.6.post1 (#2189) 2024-11-25 17:31:37 -08:00
Rin Intachuen
1aea19f64b Input_embeds support (#2052) 2024-11-25 16:35:04 -08:00
Lianmin Zheng
8e1adb8441 Allow overwrite flashinfer use_tensorcore (#2169) 2024-11-24 20:58:17 -08:00