Commit Graph

  • 3ee62235c6 revert the MoE dependence (#3230) Yineng Zhang 2025-01-31 16:51:41 +08:00
  • 9829e77e3f Docs: Update supported models with Mistral 3 (#3229) Ravi Theja 2025-01-31 13:31:46 +05:30
  • cde4bbd5cc docs: add Novita for adoption and sponsorship (#3227) Ying Sheng 2025-01-30 18:28:22 -08:00
  • 9602c2aac7 keep the parts needed for moe_kernels (#3218) Yineng Zhang 2025-01-31 00:39:47 +08:00
  • e81d7f11de add tensorrt_llm moe_gemm as 3rdparty (#3217) Yineng Zhang 2025-01-30 23:49:14 +08:00
  • 222ce6f1da add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216) Yineng Zhang 2025-01-30 23:04:41 +08:00
  • 468d23cff9 update setup for sgl-kernel (#3214) Yineng Zhang 2025-01-30 19:47:50 +08:00
  • c38b5fb4f4 update 3rdparty and rms norm for sgl-kernel (#3213) Yineng Zhang 2025-01-30 19:32:21 +08:00
  • 20453cef62 [test] Lower number of top logprobs to get rid of -inf (#3212) Byron Hsu 2025-01-30 02:01:23 -08:00
  • 9f635ea50d [Fix] Address remaining issues of supporting MiniCPMV (#2977) Mick 2025-01-28 16:22:13 +08:00
  • 76285fdeea Fix typo in README (#3190) Fidel González 2025-01-28 02:15:24 -05:00
  • 988d0a4bfc [kernel] Use sgl_kernel rope (#3169) Byron Hsu 2025-01-27 22:33:11 -08:00
  • 81262c7b72 clean up useless file (#3192) Xiaoyu Zhang 2025-01-28 14:29:30 +08:00
  • 27aeb4b7d8 [test] deduplicate test_session_control (#3183) Byron Hsu 2025-01-27 21:17:06 -08:00
  • 7b9b4f4426 Docs fix about EAGLE and streaming output (#3166) Jhin 2025-01-27 20:10:45 -06:00
  • 08104b56de Sanity check to prevent performance regression (#3171) Zhiqiang Xie 2025-01-27 12:28:17 -08:00
  • cf142b6eb8 fix: update Dockerfile for cu118 (#3181) Yineng Zhang 2025-01-27 23:46:44 +08:00
  • 4ab43cfb3e chore: bump v0.4.2 (#3180) Yineng Zhang 2025-01-27 21:42:05 +08:00
  • 2f79f58873 feat: use sgl-kernel 0.0.3 in sglang (#3179) Yineng Zhang 2025-01-27 21:39:52 +08:00
  • 8a96f74988 chore: bump 0.0.3 for sgl-kernel (#3178) Yineng Zhang 2025-01-27 20:29:28 +08:00
  • 827aa8730b cleanup sgl-kernel kernels (#3175) Yineng Zhang 2025-01-27 19:11:01 +08:00
  • f8ca66fb49 Update thresholds in test_nightly_gsm8k_eval.py (#3176) Lianmin Zheng 2025-01-27 03:02:09 -08:00
  • 53cef81587 Improve weight loading and code style (#3174) Lianmin Zheng 2025-01-27 03:00:41 -08:00
  • 351a72d40b add dsv3 mi300 triton config for block scale (#3146) yigex 2025-01-27 17:25:53 +08:00
  • 514f37c32b [kernel] Fix position ids in rope (#3173) Byron Hsu 2025-01-27 01:09:51 -08:00
  • 52c03f16b9 Add activation parameters to fused_moe (#3170) Lianmin Zheng 2025-01-27 00:23:37 -08:00
  • 741fccd7bf Bump sgl kernel to 0.0.2.post19 (#3167) Byron Hsu 2025-01-26 23:36:07 -08:00
  • 1e3e521544 add unit test for block wise fp8 (#3156) yizhang2077 2025-01-27 15:32:04 +08:00
  • fb11a43981 [kernel] Integrate flashinfer's rope with higher precision and better perf (#3134) Byron Hsu 2025-01-26 23:28:00 -08:00
  • af02f99b7c Add more logprob tests (#3162) Lianmin Zheng 2025-01-26 22:24:55 -08:00
  • 9472e69963 Doc: Add Docs about EAGLE speculative decoding (#3144) Jhin 2025-01-26 19:49:13 -06:00
  • 1acc1f561a [Docs]: Add function calling in index.rst (#3155) Chayenne 2025-01-26 11:11:27 -08:00
  • b045841bae Feature/function calling update (#2700) YAMY 2025-01-26 09:57:51 -08:00
  • f265d15b96 use self-hosted to build sgl-kernel (#3154) Yineng Zhang 2025-01-26 23:02:57 +08:00
  • 02431b9ad2 fix link in README (#3153) Yineng Zhang 2025-01-26 21:30:00 +08:00
  • 1dda8c5e4c Return more infos for computing average acceptance length (#3152) Lianmin Zheng 2025-01-26 04:51:54 -08:00
  • 7e0976133c udpate sgl-kernel version for srt (#3150) Yineng Zhang 2025-01-26 20:22:34 +08:00
  • f4a92f4b56 Temporarily skip the openai frontend tests (#3151) Lianmin Zheng 2025-01-26 04:17:35 -08:00
  • 318260c0fa chore: bump 0.0.2.post18 for sgl-kernel (#3149) Yineng Zhang 2025-01-26 19:00:34 +08:00
  • 4a61253123 Do not load OPENAI_KEY from secrets (#3147) Lianmin Zheng 2025-01-26 01:54:03 -08:00
  • d1a0863251 Add a test case for cached_tokens (#3145) Lianmin Zheng 2025-01-26 01:39:28 -08:00
  • f8b28e461a Add CPU affinity setting to latency benchmark (#3085) Hubert Lu 2025-01-25 23:52:05 -08:00
  • 82392da830 support w8a8 fp8 kernel with CUTLASS (#3047) HandH1998 2025-01-26 15:46:51 +08:00
  • 95f789adb0 minor: cleanup sgl-kernel (#3143) Yineng Zhang 2025-01-26 14:29:58 +08:00
  • 4f118a39d7 Fix repetition penalty (#3139) Lianmin Zheng 2025-01-25 21:48:58 -08:00
  • 66283dbc0c [Fix] Not skip NVML Check on AMD Platform (#3135) yigex 2025-01-26 13:33:51 +08:00
  • 822bae8c00 feat: cross python wheel for sgl-kernel (#3138) Yineng Zhang 2025-01-26 13:21:34 +08:00
  • 8e48ca8cc1 enable kv_scale for Gemma2 (#3113) Hui Liu 2025-01-25 18:29:14 -08:00
  • 27acf63bbd Use torch.compile for scaling penalty (#3133) Lianmin Zheng 2025-01-25 18:27:33 -08:00
  • da6f8081f6 Fix CI tests (#3132) Lianmin Zheng 2025-01-25 17:43:39 -08:00
  • 9286740eff feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#3130) yinfan98 2025-01-26 02:55:08 +08:00
  • 896c07441e update installation doc for sgl-kernel (#3129) Yineng Zhang 2025-01-26 00:00:13 +08:00
  • c23d5706f4 Update whl index path (#3128) Ke Bao 2025-01-25 23:57:09 +08:00
  • 67ad4338e1 Update tag name for whl release (#3127) Ke Bao 2025-01-25 23:14:35 +08:00
  • 3cab5f71ea speedup pr test for sgl-kernel (#3126) Yineng Zhang 2025-01-25 21:37:48 +08:00
  • 14e754a868 chore: bump v0.0.2.post17 for sgl-kernel (#3125) Yineng Zhang 2025-01-25 20:43:02 +08:00
  • 98522149ff mirror fix for custom allreduce (#3124) yizhang2077 2025-01-25 18:26:41 +08:00
  • 5d9d15e70f support fp32 in sampling_scaling_penalties kernel (#3121) Xiaoyu Zhang 2025-01-25 16:52:17 +08:00
  • 665e5e85f6 Add step to update sgl-kernel whl index (#3110) Ke Bao 2025-01-25 02:03:01 +08:00
  • a22f60a313 Add workflow for sgl-kernel cu118 release (#3109) Ke Bao 2025-01-24 22:30:30 +08:00
  • 04f0b4cbef minor: update sgl-kernel setup (#3107) Yineng Zhang 2025-01-24 20:10:35 +08:00
  • 4505a43614 [Docs] minor update for phi-3 and phi-4 (#3096) Adarsh Shirawalmath 2025-01-24 17:30:20 +05:30
  • 685a5738a7 Allow local cutlass directory to be used in sgl-kernel build (#3037) Trevor Morris 2025-01-24 03:59:47 -08:00
  • 153b414e83 minor: sync flashinfer and add turbomind as 3rdparty (#3105) Yineng Zhang 2025-01-24 19:22:39 +08:00
  • 6619f48e18 Fix cu118 group gemm compile issue (#3097) Ke Bao 2025-01-24 15:19:09 +08:00
  • 3ed0a547b2 [router] Fix twine uploading (#3095) Byron Hsu 2025-01-23 21:01:01 -08:00
  • 8d8ef8497e bump router to 0.1.4 (#3094) Byron Hsu 2025-01-23 20:32:43 -08:00
  • 9a0cc2e90e [router] Forward all request headers from router to workers (#3070) Byron Hsu 2025-01-23 20:30:31 -08:00
  • 7bad7e75bf Add shapes for int8 gemm benchmark (#3093) Ke Bao 2025-01-24 12:27:30 +08:00
  • 1c4e0d2445 Docs: Update doc for server arguments (#2742) simveit 2025-01-23 20:32:05 +01:00
  • 54bac8af0b chore: bump sgl-kernel 0.0.2.post16 (#3087) Yineng Zhang 2025-01-24 01:57:48 +08:00
  • 5de4051bcf feat: integrate sampling kernels into sgl-kernel (#3086) Yineng Zhang 2025-01-24 01:54:47 +08:00
  • e0cd65c2b6 [hotfix] fix test_sampling_scaling_penalties.py ci test (#3084) Xiaoyu Zhang 2025-01-24 00:33:59 +08:00
  • f1b6861828 use flashinfer vec_dtypes in sgl_kernel (#3083) Xiaoyu Zhang 2025-01-23 22:19:04 +08:00
  • 0da0989ad4 sync flashinfer and update sgl-kernel tests (#3081) Yineng Zhang 2025-01-23 21:13:55 +08:00
  • 07a22cbba3 use env variable to control the build conf on the CPU build node (#3080) Yineng Zhang 2025-01-23 20:46:49 +08:00
  • 3d0bfa3e17 update version setup for sgl-kernel (#3079) Yineng Zhang 2025-01-23 19:45:25 +08:00
  • 1f6cf0d4b9 fix build error for sgl-kernel (#3078) Yineng Zhang 2025-01-23 19:16:35 +08:00
  • 553f5a3ffe Remove torch dependency in sgl-kernel (#3074) Lianmin Zheng 2025-01-23 01:23:37 -08:00
  • ac2dc35d0e support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030) Xiaoyu Zhang 2025-01-23 15:29:20 +08:00
  • 3e032c07cc use v0.6.4.post1 for sgl-kernel ci (#3071) Yineng Zhang 2025-01-23 14:19:38 +08:00
  • 44e12ce463 docs: update developer guide for sgl-kernel (#3069) Yineng Zhang 2025-01-23 14:08:25 +08:00
  • a547aad61f docs: add developer guide for sgl-kernel (#3068) Yineng Zhang 2025-01-23 13:47:53 +08:00
  • ea535dc574 Revert "disable custom allreduce on HIP" (#3067) Lianmin Zheng 2025-01-22 21:33:35 -08:00
  • 862bcff833 Support loading of larger models with on-the-fly quantization (#3061) Ke Wen 2025-01-22 21:33:17 -08:00
  • 8b84e69f25 Fix tp token sync for dp attention (#3062) Lianmin Zheng 2025-01-22 18:51:40 -08:00
  • 5de50653cd [router] make error actionable (#3063) Byron Hsu 2025-01-22 17:56:21 -08:00
  • c0bf9bf15c [devcontainer] add non-root user (#2989) Byron Hsu 2025-01-22 17:47:54 -08:00
  • 022614d26e Add some flags to allow sync token ids across TP ranks (#3060) Lianmin Zheng 2025-01-22 15:05:51 -08:00
  • b8ab989ff4 Fix the FP8 E4M3 parsing offline scales failure bug (#3045) lukec 2025-01-23 06:19:33 +08:00
  • b3393e941f [Doc] Update doc of profiling with PyTorch Profiler (#3038) Baizhou Zhang 2025-01-22 14:17:26 -08:00
  • ddc2001fb0 disable custom allreduce on HIP (#3058) Hui Liu 2025-01-22 13:57:22 -08:00
  • 806a3002c1 add notice about flashinfer in sgl-kernel (#3057) Yineng Zhang 2025-01-23 02:47:36 +08:00
  • 0d2148efaa fix rotary_embedding rope_scaling for phi (#3055) nstream-ai-devx 2025-01-22 23:45:32 +05:30
  • bf669606eb feat: integrate bmm_fp8 kernel into sgl-kernel (#3056) Yineng Zhang 2025-01-23 00:39:38 +08:00
  • b2bd8f444c minor: update header and use pytest (#3054) Yineng Zhang 2025-01-22 23:45:18 +08:00
  • 9d9b482a39 feat: integrate activation kernels into sgl-kernel (#3053) Yineng Zhang 2025-01-22 23:25:45 +08:00
  • 7353fb9b97 feat: integrate norm kernels into sgl-kernel (#3052) Yineng Zhang 2025-01-22 21:32:48 +08:00
  • bcda0c9ee6 sync the upstream updates of flashinfer (#3051) Yineng Zhang 2025-01-22 20:33:13 +08:00
  • 9f8f2c7f74 update norm cu (#3048) Yineng Zhang 2025-01-22 18:58:44 +08:00