Commit Graph

1861 Commits

Author SHA1 Message Date
Ke Bao
665e5e85f6 Add step to update sgl-kernel whl index (#3110) 2025-01-25 02:03:01 +08:00
Ke Bao
a22f60a313 Add workflow for sgl-kernel cu118 release (#3109) 2025-01-24 22:30:30 +08:00
Yineng Zhang
04f0b4cbef minor: update sgl-kernel setup (#3107) 2025-01-24 20:10:35 +08:00
Adarsh Shirawalmath
4505a43614 [Docs] minor update for phi-3 and phi-4 (#3096) 2025-01-24 04:00:20 -08:00
Trevor Morris
685a5738a7 Allow local cutlass directory to be used in sgl-kernel build (#3037) 2025-01-24 03:59:47 -08:00
Yineng Zhang
153b414e83 minor: sync flashinfer and add turbomind as 3rdparty (#3105) 2025-01-24 19:22:39 +08:00
Ke Bao
6619f48e18 Fix cu118 group gemm compile issue (#3097) 2025-01-24 15:19:09 +08:00
Byron Hsu
3ed0a547b2 [router] Fix twine uploading (#3095) 2025-01-23 21:01:01 -08:00
Byron Hsu
8d8ef8497e bump router to 0.1.4 (#3094) 2025-01-23 20:32:43 -08:00
Byron Hsu
9a0cc2e90e [router] Forward all request headers from router to workers (#3070) 2025-01-23 20:30:31 -08:00
Ke Bao
7bad7e75bf Add shapes for int8 gemm benchmark (#3093) 2025-01-24 12:27:30 +08:00
simveit
1c4e0d2445 Docs: Update doc for server arguments (#2742)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-01-23 11:32:05 -08:00
Yineng Zhang
54bac8af0b chore: bump sgl-kernel 0.0.2.post16 (#3087) 2025-01-24 01:57:48 +08:00
Yineng Zhang
5de4051bcf feat: integrate sampling kernels into sgl-kernel (#3086)
Co-authored-by: Zihao Ye <expye@outlook.com>
2025-01-24 01:54:47 +08:00
Xiaoyu Zhang
e0cd65c2b6 [hotfix] fix test_sampling_scaling_penalties.py ci test (#3084) 2025-01-24 00:33:59 +08:00
Xiaoyu Zhang
f1b6861828 use flashinfer vec_dtypes in sgl_kernel (#3083) 2025-01-23 22:19:04 +08:00
Yineng Zhang
0da0989ad4 sync flashinfer and update sgl-kernel tests (#3081) 2025-01-23 21:13:55 +08:00
Yineng Zhang
07a22cbba3 use env variable to control the build conf on the CPU build node (#3080) 2025-01-23 20:46:49 +08:00
Yineng Zhang
3d0bfa3e17 update version setup for sgl-kernel (#3079) 2025-01-23 19:45:25 +08:00
Yineng Zhang
1f6cf0d4b9 fix build error for sgl-kernel (#3078) 2025-01-23 19:16:35 +08:00
Lianmin Zheng
553f5a3ffe Remove torch dependency in sgl-kernel (#3074) 2025-01-23 17:23:37 +08:00
Xiaoyu Zhang
ac2dc35d0e support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030) 2025-01-23 15:29:20 +08:00
Yineng Zhang
3e032c07cc use v0.6.4.post1 for sgl-kernel ci (#3071) 2025-01-23 14:19:38 +08:00
Yineng Zhang
44e12ce463 docs: update developer guide for sgl-kernel (#3069) 2025-01-23 14:08:25 +08:00
Yineng Zhang
a547aad61f docs: add developer guide for sgl-kernel (#3068) 2025-01-23 13:47:53 +08:00
Lianmin Zheng
ea535dc574 Revert "disable custom allreduce on HIP" (#3067) 2025-01-22 21:33:35 -08:00
Ke Wen
862bcff833 Support loading of larger models with on-the-fly quantization (#3061) 2025-01-22 21:33:17 -08:00
Lianmin Zheng
8b84e69f25 Fix tp token sync for dp attention (#3062) 2025-01-22 18:51:40 -08:00
Byron Hsu
5de50653cd [router] make error actionable (#3063) 2025-01-22 17:56:21 -08:00
Byron Hsu
c0bf9bf15c [devcontainer] add non-root user (#2989) 2025-01-22 17:47:54 -08:00
Lianmin Zheng
022614d26e Add some flags to allow sync token ids across TP ranks (#3060) 2025-01-22 15:05:51 -08:00
lukec
b8ab989ff4 Fix the FP8 E4M3 parsing offline scales failure bug (#3045) 2025-01-22 14:19:33 -08:00
Baizhou Zhang
b3393e941f [Doc] Update doc of profiling with PyTorch Profiler (#3038) 2025-01-22 14:17:26 -08:00
Hui Liu
ddc2001fb0 disable custom allreduce on HIP (#3058) 2025-01-22 13:57:22 -08:00
Yineng Zhang
806a3002c1 add notice about flashinfer in sgl-kernel (#3057) 2025-01-23 02:47:36 +08:00
nstream-ai-devx
0d2148efaa fix rotary_embedding rope_scaling for phi (#3055) 2025-01-23 02:15:32 +08:00
Yineng Zhang
bf669606eb feat: integrate bmm_fp8 kernel into sgl-kernel (#3056) 2025-01-23 00:39:38 +08:00
Yineng Zhang
b2bd8f444c minor: update header and use pytest (#3054) 2025-01-22 23:45:18 +08:00
Yineng Zhang
9d9b482a39 feat: integrate activation kernels into sgl-kernel (#3053) 2025-01-22 23:25:45 +08:00
Yineng Zhang
7353fb9b97 feat: integrate norm kernels into sgl-kernel (#3052) 2025-01-22 21:32:48 +08:00
Yineng Zhang
bcda0c9ee6 sync the upstream updates of flashinfer (#3051) 2025-01-22 20:33:13 +08:00
Yineng Zhang
9f8f2c7f74 update norm cu (#3048) 2025-01-22 18:58:44 +08:00
Ke Bao
6fc37bd8ee Fix sgl-kernel compile for sm80 (#3046) 2025-01-22 16:49:08 +08:00
Lianmin Zheng
3d8f1c9bcf Use int64 as indices for set_kv_buffer (#3039) 2025-01-21 19:46:09 -08:00
Yineng Zhang
a42213dbd4 fix pr-test-sgl-kernel (#3036) 2025-01-22 00:56:42 +08:00
Ke Bao
0ac019f171 Support sm90 Int8 gemm (#3035) 2025-01-21 22:21:54 +08:00
Yineng Zhang
5a0d680a14 feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033) 2025-01-21 20:44:49 +08:00
Lianmin Zheng
a4331cd260 Add accuracy and latency tests of eagle into CI (#3027) 2025-01-21 02:55:14 -08:00
Yineng Zhang
ec1c21cdc4 upgrade torch version for sgl-kernel (#3026) 2025-01-21 14:32:08 +08:00
Yineng Zhang
6c856b4f3a minor: update Makefile for sgl-kernel (#3025) 2025-01-21 13:08:15 +08:00