Commit Graph

203 Commits

Author SHA1 Message Date
Jhin
7b9b4f4426 Docs fix about EAGLE and streaming output (#3166)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jhin <jhinpan@umich.edu>
2025-01-27 18:10:45 -08:00
Yineng Zhang
827aa8730b cleanup sgl-kernel kernels (#3175) 2025-01-27 19:11:01 +08:00
Yineng Zhang
f265d15b96 use self-hosted to build sgl-kernel (#3154) 2025-01-26 23:02:57 +08:00
Lianmin Zheng
4a61253123 Do not load OPENAI_KEY from secrets (#3147) 2025-01-26 01:54:03 -08:00
Lianmin Zheng
4f118a39d7 Fix repetition penalty (#3139) 2025-01-25 21:48:58 -08:00
Yineng Zhang
822bae8c00 feat: cross python wheel for sgl-kernel (#3138) 2025-01-26 13:21:34 +08:00
Lianmin Zheng
da6f8081f6 Fix CI tests (#3132) 2025-01-25 17:43:39 -08:00
Yineng Zhang
896c07441e update installation doc for sgl-kernel (#3129) 2025-01-26 00:00:13 +08:00
Ke Bao
67ad4338e1 Update tag name for whl release (#3127) 2025-01-25 23:14:35 +08:00
Yineng Zhang
3cab5f71ea speedup pr test for sgl-kernel (#3126) 2025-01-25 21:37:48 +08:00
Ke Bao
665e5e85f6 Add step to update sgl-kernel whl index (#3110) 2025-01-25 02:03:01 +08:00
Ke Bao
a22f60a313 Add workflow for sgl-kernel cu118 release (#3109) 2025-01-24 22:30:30 +08:00
Byron Hsu
3ed0a547b2 [router] Fix twine uploading (#3095) 2025-01-23 21:01:01 -08:00
Yineng Zhang
0da0989ad4 sync flashinfer and update sgl-kernel tests (#3081) 2025-01-23 21:13:55 +08:00
Yineng Zhang
3d0bfa3e17 update version setup for sgl-kernel (#3079) 2025-01-23 19:45:25 +08:00
Yineng Zhang
1f6cf0d4b9 fix build error for sgl-kernel (#3078) 2025-01-23 19:16:35 +08:00
Yineng Zhang
3e032c07cc use v0.6.4.post1 for sgl-kernel ci (#3071) 2025-01-23 14:19:38 +08:00
Yineng Zhang
bcda0c9ee6 sync the upstream updates of flashinfer (#3051) 2025-01-22 20:33:13 +08:00
Yineng Zhang
a42213dbd4 fix pr-test-sgl-kernel (#3036) 2025-01-22 00:56:42 +08:00
Yineng Zhang
5a0d680a14 feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033) 2025-01-21 20:44:49 +08:00
Lianmin Zheng
a4331cd260 Add accuracy and latency tests of eagle into CI (#3027) 2025-01-21 02:55:14 -08:00
Yineng Zhang
ec1c21cdc4 upgrade torch version for sgl-kernel (#3026) 2025-01-21 14:32:08 +08:00
Yineng Zhang
6c856b4f3a minor: update Makefile for sgl-kernel (#3025) 2025-01-21 13:08:15 +08:00
Ke Bao
41a0ccd4f1 Add clang-format check to sgl-kernel ci (#3012) 2025-01-20 23:22:19 +08:00
Lianmin Zheng
09bcbe0123 Update TypeBasedDispatcher and balance CI tests (#3001) 2025-01-19 23:37:27 -08:00
Lianmin Zheng
cd493b5afc Improve metrics, logging, and importing orders (#2992) 2025-01-19 18:36:59 -08:00
Byron Hsu
4719c1d04a [router] Fix sgl router path for release (#2980) 2025-01-19 01:11:06 -08:00
Byron Hsu
ef18b0eda2 [router] Allow empty worker list for sglang.launch_router (#2979) 2025-01-19 01:05:23 -08:00
Ke Bao
f3e9b4894b Fix sgl-kernel ci (#2938) 2025-01-17 17:26:21 +08:00
Lianmin Zheng
6a7973add8 Update release-docs.yml (#2937) 2025-01-17 00:36:40 -08:00
saienduri
a883f0790d Update release-docker-amd.yml to run on amd docker runner. (#2927) 2025-01-16 12:42:29 -08:00
Ke Bao
58f3f2b840 Add CI for sgl-kernel (#2924) 2025-01-17 01:26:51 +08:00
Yineng Zhang
58f42b1dd8 minor: update pr test (#2908) 2025-01-16 05:51:49 +08:00
Yineng Zhang
80002562a8 docs: update README (#2878) 2025-01-14 12:48:17 +08:00
Yineng Zhang
d855653bd4 minor: fix release docs (#2868) 2025-01-13 21:18:39 +08:00
Lianmin Zheng
67008f4b32 Use only one GPU for MLA CI tests (#2858) 2025-01-13 03:55:33 -08:00
Yineng Zhang
4536d72446 minor: use ubuntu-latest instead of self-hosted runner for amd build (#2861) 2025-01-13 18:58:56 +08:00
Yineng Zhang
20a9f5dfe0 fix: not delete CNAME (#2860) 2025-01-13 18:36:40 +08:00
Yineng Zhang
a879c2fb4c fix sgl-kernel build (#2850) 2025-01-13 12:27:17 +08:00
Lianmin Zheng
bdc1acf6cd Misc fix for min_p_sampling, --cuda-graph-bs (#2761) 2025-01-07 02:52:53 -08:00
Lianmin Zheng
b0524c3789 Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684)
Co-authored-by: yukavio <kavioyu@gmail.com>
2024-12-31 02:25:05 -08:00
Yineng Zhang
d49b13c6f8 feat: use CUDA 12.4 by default (for FA3) (#2682) 2024-12-31 15:52:09 +08:00
Lianmin Zheng
8c3b420eec [Docs] clean up structured outputs docs (#2654) 2024-12-29 23:57:16 -08:00
Lianmin Zheng
b08c308ebc Update the timeout in nightly-test.yml (#2649) 2024-12-29 14:51:07 -08:00
Lianmin Zheng
855d0ba381 [CI] Fix nightly test and raise better error message (#2626)
Co-authored-by: Sangbin <rkooo567@gmail.com>
2024-12-27 22:16:39 -08:00
Lianmin Zheng
dc3bee4815 Fix test and benchmark scripts (#2598) 2024-12-26 07:56:26 -08:00
Yineng Zhang
8f4d04e540 chore: bump v0.4.0.post2 (#2525) 2024-12-21 21:16:34 +08:00
Ata Fatahi
ce094a5d79 Clean up GPU memory after killing sglang processes (#2457)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-17 03:42:40 -08:00
Yineng Zhang
7154b4b1df minor: update flashinfer nightly (#2490) 2024-12-16 23:02:49 +08:00
Yineng Zhang
f0ed9c353e feat: support dev image (#2469) 2024-12-13 02:23:52 +08:00