Fidel González
|
76285fdeea
|
Fix typo in README (#3190)
|
2025-01-27 23:15:24 -08:00 |
|
Byron Hsu
|
988d0a4bfc
|
[kernel] Use sgl_kernel rope (#3169)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-01-28 14:33:11 +08:00 |
|
Xiaoyu Zhang
|
81262c7b72
|
clean up useless file (#3192)
|
2025-01-28 14:29:30 +08:00 |
|
Byron Hsu
|
27aeb4b7d8
|
[test] deduplicate test_session_control (#3183)
|
2025-01-28 13:17:06 +08:00 |
|
Jhin
|
7b9b4f4426
|
Docs fix about EAGLE and streaming output (#3166)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jhin <jhinpan@umich.edu>
|
2025-01-27 18:10:45 -08:00 |
|
Zhiqiang Xie
|
08104b56de
|
Sanity check to prevent performance regression (#3171)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-01-27 12:28:17 -08:00 |
|
Yineng Zhang
|
cf142b6eb8
|
fix: update Dockerfile for cu118 (#3181)
|
2025-01-27 23:46:44 +08:00 |
|
Yineng Zhang
|
4ab43cfb3e
|
chore: bump v0.4.2 (#3180)
|
2025-01-27 21:42:05 +08:00 |
|
Yineng Zhang
|
2f79f58873
|
feat: use sgl-kernel 0.0.3 in sglang (#3179)
|
2025-01-27 21:39:52 +08:00 |
|
Yineng Zhang
|
8a96f74988
|
chore: bump 0.0.3 for sgl-kernel (#3178)
Co-authored-by: ispobock <ispobaoke@hotmail.com>
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
Co-authored-by: HandH1998 <007aabbcc411@gmail.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
|
2025-01-27 20:29:28 +08:00 |
|
Yineng Zhang
|
827aa8730b
|
cleanup sgl-kernel kernels (#3175)
|
2025-01-27 19:11:01 +08:00 |
|
Lianmin Zheng
|
f8ca66fb49
|
Update thresholds in test_nightly_gsm8k_eval.py (#3176)
|
2025-01-27 03:02:09 -08:00 |
|
Lianmin Zheng
|
53cef81587
|
Improve weight loading and code style (#3174)
|
2025-01-27 03:00:41 -08:00 |
|
yigex
|
351a72d40b
|
add dsv3 mi300 triton config for block scale (#3146)
|
2025-01-27 17:25:53 +08:00 |
|
Byron Hsu
|
514f37c32b
|
[kernel] Fix position ids in rope (#3173)
|
2025-01-27 17:09:51 +08:00 |
|
Lianmin Zheng
|
52c03f16b9
|
Add activation parameters to fused_moe (#3170)
|
2025-01-27 00:23:37 -08:00 |
|
Byron Hsu
|
741fccd7bf
|
Bump sgl kernel to 0.0.2.post19 (#3167)
|
2025-01-27 15:36:07 +08:00 |
|
yizhang2077
|
1e3e521544
|
add unit test for block wise fp8 (#3156)
|
2025-01-27 15:32:04 +08:00 |
|
Byron Hsu
|
fb11a43981
|
[kernel] Integrate flashinfer's rope with higher precision and better perf (#3134)
|
2025-01-27 15:28:00 +08:00 |
|
Lianmin Zheng
|
af02f99b7c
|
Add more logprob tests (#3162)
|
2025-01-26 22:24:55 -08:00 |
|
Jhin
|
9472e69963
|
Doc: Add Docs about EAGLE speculative decoding (#3144)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-01-26 17:49:13 -08:00 |
|
Chayenne
|
1acc1f561a
|
[Docs]: Add function calling in index.rst (#3155)
|
2025-01-26 11:11:27 -08:00 |
|
YAMY
|
b045841bae
|
Feature/function calling update (#2700)
Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-01-26 09:57:51 -08:00 |
|
Yineng Zhang
|
f265d15b96
|
use self-hosted to build sgl-kernel (#3154)
|
2025-01-26 23:02:57 +08:00 |
|
Yineng Zhang
|
02431b9ad2
|
fix link in README (#3153)
|
2025-01-26 21:30:00 +08:00 |
|
Lianmin Zheng
|
1dda8c5e4c
|
Return more infos for computing average acceptance length (#3152)
|
2025-01-26 04:51:54 -08:00 |
|
Yineng Zhang
|
7e0976133c
|
udpate sgl-kernel version for srt (#3150)
|
2025-01-26 20:22:34 +08:00 |
|
Lianmin Zheng
|
f4a92f4b56
|
Temporarily skip the openai frontend tests (#3151)
|
2025-01-26 04:17:35 -08:00 |
|
Yineng Zhang
|
318260c0fa
|
chore: bump 0.0.2.post18 for sgl-kernel (#3149)
|
2025-01-26 19:00:34 +08:00 |
|
Lianmin Zheng
|
4a61253123
|
Do not load OPENAI_KEY from secrets (#3147)
|
2025-01-26 01:54:03 -08:00 |
|
Lianmin Zheng
|
d1a0863251
|
Add a test case for cached_tokens (#3145)
|
2025-01-26 01:39:28 -08:00 |
|
Hubert Lu
|
f8b28e461a
|
Add CPU affinity setting to latency benchmark (#3085)
|
2025-01-25 23:52:05 -08:00 |
|
HandH1998
|
82392da830
|
support w8a8 fp8 kernel with CUTLASS (#3047)
Co-authored-by: yych0745 <1398089567@qq.com>
|
2025-01-26 15:46:51 +08:00 |
|
Yineng Zhang
|
95f789adb0
|
minor: cleanup sgl-kernel (#3143)
|
2025-01-26 14:29:58 +08:00 |
|
Lianmin Zheng
|
4f118a39d7
|
Fix repetition penalty (#3139)
|
2025-01-25 21:48:58 -08:00 |
|
yigex
|
66283dbc0c
|
[Fix] Not skip NVML Check on AMD Platform (#3135)
|
2025-01-25 21:33:51 -08:00 |
|
Yineng Zhang
|
822bae8c00
|
feat: cross python wheel for sgl-kernel (#3138)
|
2025-01-26 13:21:34 +08:00 |
|
Hui Liu
|
8e48ca8cc1
|
enable kv_scale for Gemma2 (#3113)
|
2025-01-25 18:29:14 -08:00 |
|
Lianmin Zheng
|
27acf63bbd
|
Use torch.compile for scaling penalty (#3133)
|
2025-01-25 18:27:33 -08:00 |
|
Lianmin Zheng
|
da6f8081f6
|
Fix CI tests (#3132)
|
2025-01-25 17:43:39 -08:00 |
|
yinfan98
|
9286740eff
|
feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#3130)
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com>
Co-authored-by: yinfan98 <1106110035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-01-26 02:55:08 +08:00 |
|
Yineng Zhang
|
896c07441e
|
update installation doc for sgl-kernel (#3129)
|
2025-01-26 00:00:13 +08:00 |
|
Ke Bao
|
c23d5706f4
|
Update whl index path (#3128)
|
2025-01-25 23:57:09 +08:00 |
|
Ke Bao
|
67ad4338e1
|
Update tag name for whl release (#3127)
|
2025-01-25 23:14:35 +08:00 |
|
Yineng Zhang
|
3cab5f71ea
|
speedup pr test for sgl-kernel (#3126)
|
2025-01-25 21:37:48 +08:00 |
|
Yineng Zhang
|
14e754a868
|
chore: bump v0.0.2.post17 for sgl-kernel (#3125)
|
2025-01-25 20:43:02 +08:00 |
|
yizhang2077
|
98522149ff
|
mirror fix for custom allreduce (#3124)
|
2025-01-25 18:26:41 +08:00 |
|
Xiaoyu Zhang
|
5d9d15e70f
|
support fp32 in sampling_scaling_penalties kernel (#3121)
|
2025-01-25 16:52:17 +08:00 |
|
Ke Bao
|
665e5e85f6
|
Add step to update sgl-kernel whl index (#3110)
|
2025-01-25 02:03:01 +08:00 |
|
Ke Bao
|
a22f60a313
|
Add workflow for sgl-kernel cu118 release (#3109)
|
2025-01-24 22:30:30 +08:00 |
|