Yineng Zhang
|
8f4d04e540
|
chore: bump v0.4.0.post2 (#2525)
|
2024-12-21 21:16:34 +08:00 |
|
Ata Fatahi
|
ce094a5d79
|
Clean up GPU memory after killing sglang processes (#2457)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
|
2024-12-17 03:42:40 -08:00 |
|
Yineng Zhang
|
7154b4b1df
|
minor: update flashinfer nightly (#2490)
|
2024-12-16 23:02:49 +08:00 |
|
Yineng Zhang
|
f0ed9c353e
|
feat: support dev image (#2469)
|
2024-12-13 02:23:52 +08:00 |
|
Ata Fatahi
|
e3b3acfa6f
|
Rename rust folder to sgl-router (#2464)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
|
2024-12-12 09:40:41 -08:00 |
|
Yineng Zhang
|
2673fa29d4
|
fix: set runtime path (#2466)
|
2024-12-12 18:05:48 +08:00 |
|
Yineng Zhang
|
32ed016041
|
chore: bump v0.0.2 for sgl-kernel (#2462)
|
2024-12-12 14:58:05 +08:00 |
|
Ata Fatahi
|
2ac36b9a7b
|
Make request payload size configurable (#2444)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
|
2024-12-11 16:55:21 -08:00 |
|
Yineng Zhang
|
56fcd8e8a5
|
feat: support sgl-kernel PyPI (#2433)
Co-authored-by: Zhangyi <1109276519@qq.com>
|
2024-12-11 06:06:19 +08:00 |
|
Yineng Zhang
|
74bc9184c3
|
minor: add random use case (#2408)
|
2024-12-09 03:21:35 +08:00 |
|
Yineng Zhang
|
0f8eb15323
|
feat: support custom task runner (#2407)
|
2024-12-09 02:29:55 +08:00 |
|
Byron Hsu
|
c36736c841
|
[router] Add remove worker api (#2380)
|
2024-12-06 17:16:03 -08:00 |
|
xiaobochen
|
3d32e4a32c
|
Resubmit MoE-EP (#2371)
|
2024-12-06 15:05:21 +08:00 |
|
Byron Hsu
|
64fceab8af
|
[router] use 2-gpu-runner (#2368)
|
2024-12-06 14:13:57 +08:00 |
|
Byron Hsu
|
0495796517
|
[router] Copy license when publishing & bump version (#2339)
|
2024-12-03 10:27:43 -08:00 |
|
Chayenne
|
983bfcf386
|
Online weight updates from torch.distributed (#2279)
|
2024-12-01 23:23:18 -08:00 |
|
Lianmin Zheng
|
5c18a03733
|
Fix logprob for completions (#2301)
|
2024-12-01 05:17:05 -08:00 |
|
Yineng Zhang
|
e9a6203dee
|
feat: skip good first issue (#2298)
|
2024-12-01 19:18:57 +08:00 |
|
Yineng Zhang
|
fc78640e00
|
minor: support flashinfer nightly (#2295)
|
2024-12-01 18:55:26 +08:00 |
|
Lianmin Zheng
|
9449a95431
|
[CI] Balance CI tests (#2293)
|
2024-12-01 01:47:30 -08:00 |
|
Chayenne
|
7d1485d376
|
Add get weights by parameter name for llama (#2266)
|
2024-11-29 23:36:38 -08:00 |
|
Lianmin Zheng
|
b2ccf36d4d
|
Fix memory leak during abort (#2238)
|
2024-11-28 02:22:15 -08:00 |
|
Lianmin Zheng
|
ea34350d88
|
Rename double sparsity config file (#2188)
|
2024-11-25 17:12:08 -08:00 |
|
Byron Hsu
|
1f76fc6e3f
|
[router] Rust e2e test (#2184)
|
2024-11-25 16:02:03 -08:00 |
|
Lianmin Zheng
|
fe5d3e818f
|
Balance CI tests (#2162)
|
2024-11-24 07:38:52 -08:00 |
|
Lianmin Zheng
|
731146f6cb
|
Fix mixed chunked prefill in overlap mode (#2158)
|
2024-11-24 07:17:37 -08:00 |
|
Lianmin Zheng
|
5652c56535
|
Update CI threshold & Improve code style (#2159)
|
2024-11-24 06:29:38 -08:00 |
|
Byron Hsu
|
84a1698d67
|
Update release-pypi-router.yml
|
2024-11-23 17:35:25 -08:00 |
|
Byron Hsu
|
32293a299c
|
Improve sglang router (#2148)
|
2024-11-23 17:34:24 -08:00 |
|
Yineng Zhang
|
4f8c3aeafc
|
minor: update gsm8k threshold (#2125)
|
2024-11-22 19:23:58 +08:00 |
|
Lianmin Zheng
|
dfec7fca06
|
Rename sglang.bench_latency to sglang.bench_one_batch (#2118)
|
2024-11-21 20:07:48 -08:00 |
|
Lianmin Zheng
|
63a395b985
|
Update nightly-eval.yml (#2100)
|
2024-11-19 22:15:02 -08:00 |
|
Lianmin Zheng
|
c1f401fc58
|
Revert "chore: update torch v2.5.1" (#2063)
|
2024-11-17 15:29:38 -08:00 |
|
Yineng Zhang
|
3b878863f7
|
chore: update torch v2.5.1 (#1849)
|
2024-11-18 00:06:00 +08:00 |
|
Ke Bao
|
976bc302e5
|
Support DP MLA (#1970)
|
2024-11-16 09:01:43 +00:00 |
|
Lianmin Zheng
|
befc6beb86
|
Fix a typo in io_struct.py (#2008)
|
2024-11-11 16:34:10 -08:00 |
|
Byron Hsu
|
3e33574374
|
run rust test on ubuntu instead of 1-gpu-runner (#2003)
|
2024-11-11 14:46:08 -08:00 |
|
Byron Hsu
|
e728258d34
|
release router from py38 to py312 (#2002)
|
2024-11-11 14:30:25 -08:00 |
|
Byron Hsu
|
239eafbd2e
|
Fix rust unit test and pypi token (#2001)
|
2024-11-11 14:18:21 -08:00 |
|
Byron Hsu
|
00ffde206f
|
setup router python binding ci (#1999)
|
2024-11-11 12:19:32 -08:00 |
|
Lianmin Zheng
|
3d043319aa
|
[CI] Balance unit tests (#1988)
|
2024-11-10 11:45:01 -08:00 |
|
Lianmin Zheng
|
520f0094e4
|
[CI] balance unit tests (#1977)
|
2024-11-09 16:46:14 -08:00 |
|
Lianmin Zheng
|
a1f32867ca
|
Update pr-test-rust.yml to add a "finish" step (#1975)
|
2024-11-09 13:53:35 -08:00 |
|
HAI
|
f9a377f650
|
[Release, ROCm] release ROCm docker build for AMD MI GPUs (#1957)
|
2024-11-08 00:14:15 -08:00 |
|
Lianmin Zheng
|
691808d587
|
Add a timeout for execute-notebook.yml (#1951)
|
2024-11-08 10:28:29 +08:00 |
|
HAI
|
dca87ec348
|
[Docs] fix 404 - Contributor Guide (#1942)
|
2024-11-07 16:50:45 +08:00 |
|
Xuehai Pan
|
a5e0defb5a
|
minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926)
|
2024-11-06 13:46:04 +00:00 |
|
Byron Hsu
|
96766101b4
|
[rust] refactor server and router (#1922)
|
2024-11-06 00:02:02 -08:00 |
|
Byron Hsu
|
463d56bf44
|
Update CODEOWNERS (#1916)
|
2024-11-05 09:13:41 +08:00 |
|
Byron Hsu
|
530ff541cf
|
[router] Impl radix tree and set up CI (#1893)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-11-04 10:56:52 -08:00 |
|