Commit Graph

180 Commits

Author SHA1 Message Date
Ke Bao
41a0ccd4f1 Add clang-format check to sgl-kernel ci (#3012) 2025-01-20 23:22:19 +08:00
Lianmin Zheng
09bcbe0123 Update TypeBasedDispatcher and balance CI tests (#3001) 2025-01-19 23:37:27 -08:00
Lianmin Zheng
cd493b5afc Improve metrics, logging, and importing orders (#2992) 2025-01-19 18:36:59 -08:00
Byron Hsu
4719c1d04a [router] Fix sgl router path for release (#2980) 2025-01-19 01:11:06 -08:00
Byron Hsu
ef18b0eda2 [router] Allow empty worker list for sglang.launch_router (#2979) 2025-01-19 01:05:23 -08:00
Ke Bao
f3e9b4894b Fix sgl-kernel ci (#2938) 2025-01-17 17:26:21 +08:00
Lianmin Zheng
6a7973add8 Update release-docs.yml (#2937) 2025-01-17 00:36:40 -08:00
saienduri
a883f0790d Update release-docker-amd.yml to run on amd docker runner. (#2927) 2025-01-16 12:42:29 -08:00
Ke Bao
58f3f2b840 Add CI for sgl-kernel (#2924) 2025-01-17 01:26:51 +08:00
Yineng Zhang
58f42b1dd8 minor: update pr test (#2908) 2025-01-16 05:51:49 +08:00
Yineng Zhang
80002562a8 docs: update README (#2878) 2025-01-14 12:48:17 +08:00
Yineng Zhang
d855653bd4 minor: fix release docs (#2868) 2025-01-13 21:18:39 +08:00
Lianmin Zheng
67008f4b32 Use only one GPU for MLA CI tests (#2858) 2025-01-13 03:55:33 -08:00
Yineng Zhang
4536d72446 minor: use ubuntu-latest instead of self-hosted runner for amd build (#2861) 2025-01-13 18:58:56 +08:00
Yineng Zhang
20a9f5dfe0 fix: not delete CNAME (#2860) 2025-01-13 18:36:40 +08:00
Yineng Zhang
a879c2fb4c fix sgl-kernel build (#2850) 2025-01-13 12:27:17 +08:00
Lianmin Zheng
bdc1acf6cd Misc fix for min_p_sampling, --cuda-graph-bs (#2761) 2025-01-07 02:52:53 -08:00
Lianmin Zheng
b0524c3789 Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684)
Co-authored-by: yukavio <kavioyu@gmail.com>
2024-12-31 02:25:05 -08:00
Yineng Zhang
d49b13c6f8 feat: use CUDA 12.4 by default (for FA3) (#2682) 2024-12-31 15:52:09 +08:00
Lianmin Zheng
8c3b420eec [Docs] clean up structured outputs docs (#2654) 2024-12-29 23:57:16 -08:00
Lianmin Zheng
b08c308ebc Update the timeout in nightly-test.yml (#2649) 2024-12-29 14:51:07 -08:00
Lianmin Zheng
855d0ba381 [CI] Fix nightly test and raise better error message (#2626)
Co-authored-by: Sangbin <rkooo567@gmail.com>
2024-12-27 22:16:39 -08:00
Lianmin Zheng
dc3bee4815 Fix test and benchmark scripts (#2598) 2024-12-26 07:56:26 -08:00
Yineng Zhang
8f4d04e540 chore: bump v0.4.0.post2 (#2525) 2024-12-21 21:16:34 +08:00
Ata Fatahi
ce094a5d79 Clean up GPU memory after killing sglang processes (#2457)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-17 03:42:40 -08:00
Yineng Zhang
7154b4b1df minor: update flashinfer nightly (#2490) 2024-12-16 23:02:49 +08:00
Yineng Zhang
f0ed9c353e feat: support dev image (#2469) 2024-12-13 02:23:52 +08:00
Ata Fatahi
e3b3acfa6f Rename rust folder to sgl-router (#2464)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-12 09:40:41 -08:00
Yineng Zhang
2673fa29d4 fix: set runtime path (#2466) 2024-12-12 18:05:48 +08:00
Yineng Zhang
32ed016041 chore: bump v0.0.2 for sgl-kernel (#2462) 2024-12-12 14:58:05 +08:00
Ata Fatahi
2ac36b9a7b Make request payload size configurable (#2444)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-11 16:55:21 -08:00
Yineng Zhang
56fcd8e8a5 feat: support sgl-kernel PyPI (#2433)
Co-authored-by: Zhangyi <1109276519@qq.com>
2024-12-11 06:06:19 +08:00
Yineng Zhang
74bc9184c3 minor: add random use case (#2408) 2024-12-09 03:21:35 +08:00
Yineng Zhang
0f8eb15323 feat: support custom task runner (#2407) 2024-12-09 02:29:55 +08:00
Byron Hsu
c36736c841 [router] Add remove worker api (#2380) 2024-12-06 17:16:03 -08:00
xiaobochen
3d32e4a32c Resubmit MoE-EP (#2371) 2024-12-06 15:05:21 +08:00
Byron Hsu
64fceab8af [router] use 2-gpu-runner (#2368) 2024-12-06 14:13:57 +08:00
Byron Hsu
0495796517 [router] Copy license when publishing & bump version (#2339) 2024-12-03 10:27:43 -08:00
Chayenne
983bfcf386 Online weight updates from torch.distributed (#2279) 2024-12-01 23:23:18 -08:00
Lianmin Zheng
5c18a03733 Fix logprob for completions (#2301) 2024-12-01 05:17:05 -08:00
Yineng Zhang
e9a6203dee feat: skip good first issue (#2298) 2024-12-01 19:18:57 +08:00
Yineng Zhang
fc78640e00 minor: support flashinfer nightly (#2295) 2024-12-01 18:55:26 +08:00
Lianmin Zheng
9449a95431 [CI] Balance CI tests (#2293) 2024-12-01 01:47:30 -08:00
Chayenne
7d1485d376 Add get weights by parameter name for llama (#2266) 2024-11-29 23:36:38 -08:00
Lianmin Zheng
b2ccf36d4d Fix memory leak during abort (#2238) 2024-11-28 02:22:15 -08:00
Lianmin Zheng
ea34350d88 Rename double sparsity config file (#2188) 2024-11-25 17:12:08 -08:00
Byron Hsu
1f76fc6e3f [router] Rust e2e test (#2184) 2024-11-25 16:02:03 -08:00
Lianmin Zheng
fe5d3e818f Balance CI tests (#2162) 2024-11-24 07:38:52 -08:00
Lianmin Zheng
731146f6cb Fix mixed chunked prefill in overlap mode (#2158) 2024-11-24 07:17:37 -08:00
Lianmin Zheng
5652c56535 Update CI threshold & Improve code style (#2159) 2024-11-24 06:29:38 -08:00