Jhin
|
7b9b4f4426
|
Docs fix about EAGLE and streaming output (#3166)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jhin <jhinpan@umich.edu>
|
2025-01-27 18:10:45 -08:00 |
|
Yineng Zhang
|
827aa8730b
|
cleanup sgl-kernel kernels (#3175)
|
2025-01-27 19:11:01 +08:00 |
|
Yineng Zhang
|
f265d15b96
|
use self-hosted to build sgl-kernel (#3154)
|
2025-01-26 23:02:57 +08:00 |
|
Lianmin Zheng
|
4a61253123
|
Do not load OPENAI_KEY from secrets (#3147)
|
2025-01-26 01:54:03 -08:00 |
|
Lianmin Zheng
|
4f118a39d7
|
Fix repetition penalty (#3139)
|
2025-01-25 21:48:58 -08:00 |
|
Yineng Zhang
|
822bae8c00
|
feat: cross python wheel for sgl-kernel (#3138)
|
2025-01-26 13:21:34 +08:00 |
|
Lianmin Zheng
|
da6f8081f6
|
Fix CI tests (#3132)
|
2025-01-25 17:43:39 -08:00 |
|
Yineng Zhang
|
896c07441e
|
update installation doc for sgl-kernel (#3129)
|
2025-01-26 00:00:13 +08:00 |
|
Ke Bao
|
67ad4338e1
|
Update tag name for whl release (#3127)
|
2025-01-25 23:14:35 +08:00 |
|
Yineng Zhang
|
3cab5f71ea
|
speedup pr test for sgl-kernel (#3126)
|
2025-01-25 21:37:48 +08:00 |
|
Ke Bao
|
665e5e85f6
|
Add step to update sgl-kernel whl index (#3110)
|
2025-01-25 02:03:01 +08:00 |
|
Ke Bao
|
a22f60a313
|
Add workflow for sgl-kernel cu118 release (#3109)
|
2025-01-24 22:30:30 +08:00 |
|
Byron Hsu
|
3ed0a547b2
|
[router] Fix twine uploading (#3095)
|
2025-01-23 21:01:01 -08:00 |
|
Yineng Zhang
|
0da0989ad4
|
sync flashinfer and update sgl-kernel tests (#3081)
|
2025-01-23 21:13:55 +08:00 |
|
Yineng Zhang
|
3d0bfa3e17
|
update version setup for sgl-kernel (#3079)
|
2025-01-23 19:45:25 +08:00 |
|
Yineng Zhang
|
1f6cf0d4b9
|
fix build error for sgl-kernel (#3078)
|
2025-01-23 19:16:35 +08:00 |
|
Yineng Zhang
|
3e032c07cc
|
use v0.6.4.post1 for sgl-kernel ci (#3071)
|
2025-01-23 14:19:38 +08:00 |
|
Yineng Zhang
|
bcda0c9ee6
|
sync the upstream updates of flashinfer (#3051)
|
2025-01-22 20:33:13 +08:00 |
|
Yineng Zhang
|
a42213dbd4
|
fix pr-test-sgl-kernel (#3036)
|
2025-01-22 00:56:42 +08:00 |
|
Yineng Zhang
|
5a0d680a14
|
feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033)
|
2025-01-21 20:44:49 +08:00 |
|
Lianmin Zheng
|
a4331cd260
|
Add accuracy and latency tests of eagle into CI (#3027)
|
2025-01-21 02:55:14 -08:00 |
|
Yineng Zhang
|
ec1c21cdc4
|
upgrade torch version for sgl-kernel (#3026)
|
2025-01-21 14:32:08 +08:00 |
|
Yineng Zhang
|
6c856b4f3a
|
minor: update Makefile for sgl-kernel (#3025)
|
2025-01-21 13:08:15 +08:00 |
|
Ke Bao
|
41a0ccd4f1
|
Add clang-format check to sgl-kernel ci (#3012)
|
2025-01-20 23:22:19 +08:00 |
|
Lianmin Zheng
|
09bcbe0123
|
Update TypeBasedDispatcher and balance CI tests (#3001)
|
2025-01-19 23:37:27 -08:00 |
|
Lianmin Zheng
|
cd493b5afc
|
Improve metrics, logging, and importing orders (#2992)
|
2025-01-19 18:36:59 -08:00 |
|
Byron Hsu
|
4719c1d04a
|
[router] Fix sgl router path for release (#2980)
|
2025-01-19 01:11:06 -08:00 |
|
Byron Hsu
|
ef18b0eda2
|
[router] Allow empty worker list for sglang.launch_router (#2979)
|
2025-01-19 01:05:23 -08:00 |
|
Ke Bao
|
f3e9b4894b
|
Fix sgl-kernel ci (#2938)
|
2025-01-17 17:26:21 +08:00 |
|
Lianmin Zheng
|
6a7973add8
|
Update release-docs.yml (#2937)
|
2025-01-17 00:36:40 -08:00 |
|
saienduri
|
a883f0790d
|
Update release-docker-amd.yml to run on amd docker runner. (#2927)
|
2025-01-16 12:42:29 -08:00 |
|
Ke Bao
|
58f3f2b840
|
Add CI for sgl-kernel (#2924)
|
2025-01-17 01:26:51 +08:00 |
|
Yineng Zhang
|
58f42b1dd8
|
minor: update pr test (#2908)
|
2025-01-16 05:51:49 +08:00 |
|
Yineng Zhang
|
80002562a8
|
docs: update README (#2878)
|
2025-01-14 12:48:17 +08:00 |
|
Yineng Zhang
|
d855653bd4
|
minor: fix release docs (#2868)
|
2025-01-13 21:18:39 +08:00 |
|
Lianmin Zheng
|
67008f4b32
|
Use only one GPU for MLA CI tests (#2858)
|
2025-01-13 03:55:33 -08:00 |
|
Yineng Zhang
|
4536d72446
|
minor: use ubuntu-latest instead of self-hosted runner for amd build (#2861)
|
2025-01-13 18:58:56 +08:00 |
|
Yineng Zhang
|
20a9f5dfe0
|
fix: not delete CNAME (#2860)
|
2025-01-13 18:36:40 +08:00 |
|
Yineng Zhang
|
a879c2fb4c
|
fix sgl-kernel build (#2850)
|
2025-01-13 12:27:17 +08:00 |
|
Lianmin Zheng
|
bdc1acf6cd
|
Misc fix for min_p_sampling, --cuda-graph-bs (#2761)
|
2025-01-07 02:52:53 -08:00 |
|
Lianmin Zheng
|
b0524c3789
|
Eagle speculative decoding part 2: Fix cuda graph + DP attention hanging (#2684)
Co-authored-by: yukavio <kavioyu@gmail.com>
|
2024-12-31 02:25:05 -08:00 |
|
Yineng Zhang
|
d49b13c6f8
|
feat: use CUDA 12.4 by default (for FA3) (#2682)
|
2024-12-31 15:52:09 +08:00 |
|
Lianmin Zheng
|
8c3b420eec
|
[Docs] clean up structured outputs docs (#2654)
|
2024-12-29 23:57:16 -08:00 |
|
Lianmin Zheng
|
b08c308ebc
|
Update the timeout in nightly-test.yml (#2649)
|
2024-12-29 14:51:07 -08:00 |
|
Lianmin Zheng
|
855d0ba381
|
[CI] Fix nightly test and raise better error message (#2626)
Co-authored-by: Sangbin <rkooo567@gmail.com>
|
2024-12-27 22:16:39 -08:00 |
|
Lianmin Zheng
|
dc3bee4815
|
Fix test and benchmark scripts (#2598)
|
2024-12-26 07:56:26 -08:00 |
|
Yineng Zhang
|
8f4d04e540
|
chore: bump v0.4.0.post2 (#2525)
|
2024-12-21 21:16:34 +08:00 |
|
Ata Fatahi
|
ce094a5d79
|
Clean up GPU memory after killing sglang processes (#2457)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
|
2024-12-17 03:42:40 -08:00 |
|
Yineng Zhang
|
7154b4b1df
|
minor: update flashinfer nightly (#2490)
|
2024-12-16 23:02:49 +08:00 |
|
Yineng Zhang
|
f0ed9c353e
|
feat: support dev image (#2469)
|
2024-12-13 02:23:52 +08:00 |
|