Commit Graph

170 Commits

Author SHA1 Message Date
Yineng Zhang
8f4d04e540 chore: bump v0.4.0.post2 (#2525) 2024-12-21 21:16:34 +08:00
Ata Fatahi
ce094a5d79 Clean up GPU memory after killing sglang processes (#2457)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-17 03:42:40 -08:00
Yineng Zhang
7154b4b1df minor: update flashinfer nightly (#2490) 2024-12-16 23:02:49 +08:00
Yineng Zhang
f0ed9c353e feat: support dev image (#2469) 2024-12-13 02:23:52 +08:00
Ata Fatahi
e3b3acfa6f Rename rust folder to sgl-router (#2464)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-12 09:40:41 -08:00
Yineng Zhang
2673fa29d4 fix: set runtime path (#2466) 2024-12-12 18:05:48 +08:00
Yineng Zhang
32ed016041 chore: bump v0.0.2 for sgl-kernel (#2462) 2024-12-12 14:58:05 +08:00
Ata Fatahi
2ac36b9a7b Make request payload size configurable (#2444)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-11 16:55:21 -08:00
Yineng Zhang
56fcd8e8a5 feat: support sgl-kernel PyPI (#2433)
Co-authored-by: Zhangyi <1109276519@qq.com>
2024-12-11 06:06:19 +08:00
Yineng Zhang
74bc9184c3 minor: add random use case (#2408) 2024-12-09 03:21:35 +08:00
Yineng Zhang
0f8eb15323 feat: support custom task runner (#2407) 2024-12-09 02:29:55 +08:00
Byron Hsu
c36736c841 [router] Add remove worker api (#2380) 2024-12-06 17:16:03 -08:00
xiaobochen
3d32e4a32c Resubmit MoE-EP (#2371) 2024-12-06 15:05:21 +08:00
Byron Hsu
64fceab8af [router] use 2-gpu-runner (#2368) 2024-12-06 14:13:57 +08:00
Byron Hsu
0495796517 [router] Copy license when publishing & bump version (#2339) 2024-12-03 10:27:43 -08:00
Chayenne
983bfcf386 Online weight updates from torch.distributed (#2279) 2024-12-01 23:23:18 -08:00
Lianmin Zheng
5c18a03733 Fix logprob for completions (#2301) 2024-12-01 05:17:05 -08:00
Yineng Zhang
e9a6203dee feat: skip good first issue (#2298) 2024-12-01 19:18:57 +08:00
Yineng Zhang
fc78640e00 minor: support flashinfer nightly (#2295) 2024-12-01 18:55:26 +08:00
Lianmin Zheng
9449a95431 [CI] Balance CI tests (#2293) 2024-12-01 01:47:30 -08:00
Chayenne
7d1485d376 Add get weights by parameter name for llama (#2266) 2024-11-29 23:36:38 -08:00
Lianmin Zheng
b2ccf36d4d Fix memory leak during abort (#2238) 2024-11-28 02:22:15 -08:00
Lianmin Zheng
ea34350d88 Rename double sparsity config file (#2188) 2024-11-25 17:12:08 -08:00
Byron Hsu
1f76fc6e3f [router] Rust e2e test (#2184) 2024-11-25 16:02:03 -08:00
Lianmin Zheng
fe5d3e818f Balance CI tests (#2162) 2024-11-24 07:38:52 -08:00
Lianmin Zheng
731146f6cb Fix mixed chunked prefill in overlap mode (#2158) 2024-11-24 07:17:37 -08:00
Lianmin Zheng
5652c56535 Update CI threshold & Improve code style (#2159) 2024-11-24 06:29:38 -08:00
Byron Hsu
84a1698d67 Update release-pypi-router.yml 2024-11-23 17:35:25 -08:00
Byron Hsu
32293a299c Improve sglang router (#2148) 2024-11-23 17:34:24 -08:00
Yineng Zhang
4f8c3aeafc minor: update gsm8k threshold (#2125) 2024-11-22 19:23:58 +08:00
Lianmin Zheng
dfec7fca06 Rename sglang.bench_latency to sglang.bench_one_batch (#2118) 2024-11-21 20:07:48 -08:00
Lianmin Zheng
63a395b985 Update nightly-eval.yml (#2100) 2024-11-19 22:15:02 -08:00
Lianmin Zheng
c1f401fc58 Revert "chore: update torch v2.5.1" (#2063) 2024-11-17 15:29:38 -08:00
Yineng Zhang
3b878863f7 chore: update torch v2.5.1 (#1849) 2024-11-18 00:06:00 +08:00
Ke Bao
976bc302e5 Support DP MLA (#1970) 2024-11-16 09:01:43 +00:00
Lianmin Zheng
befc6beb86 Fix a typo in io_struct.py (#2008) 2024-11-11 16:34:10 -08:00
Byron Hsu
3e33574374 run rust test on ubuntu instead of 1-gpu-runner (#2003) 2024-11-11 14:46:08 -08:00
Byron Hsu
e728258d34 release router from py38 to py312 (#2002) 2024-11-11 14:30:25 -08:00
Byron Hsu
239eafbd2e Fix rust unit test and pypi token (#2001) 2024-11-11 14:18:21 -08:00
Byron Hsu
00ffde206f setup router python binding ci (#1999) 2024-11-11 12:19:32 -08:00
Lianmin Zheng
3d043319aa [CI] Balance unit tests (#1988) 2024-11-10 11:45:01 -08:00
Lianmin Zheng
520f0094e4 [CI] balance unit tests (#1977) 2024-11-09 16:46:14 -08:00
Lianmin Zheng
a1f32867ca Update pr-test-rust.yml to add a "finish" step (#1975) 2024-11-09 13:53:35 -08:00
HAI
f9a377f650 [Release, ROCm] release ROCm docker build for AMD MI GPUs (#1957) 2024-11-08 00:14:15 -08:00
Lianmin Zheng
691808d587 Add a timeout for execute-notebook.yml (#1951) 2024-11-08 10:28:29 +08:00
HAI
dca87ec348 [Docs] fix 404 - Contributor Guide (#1942) 2024-11-07 16:50:45 +08:00
Xuehai Pan
a5e0defb5a minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926) 2024-11-06 13:46:04 +00:00
Byron Hsu
96766101b4 [rust] refactor server and router (#1922) 2024-11-06 00:02:02 -08:00
Byron Hsu
463d56bf44 Update CODEOWNERS (#1916) 2024-11-05 09:13:41 +08:00
Byron Hsu
530ff541cf [router] Impl radix tree and set up CI (#1893)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-11-04 10:56:52 -08:00