Commit Graph

1948 Commits

Author SHA1 Message Date
Yineng Zhang
897e2e253a add Nebius for Adoption and Sponsorship (#3274) 2025-02-04 04:41:26 +08:00
kushanam
d54cee1441 adding Triton configs for DeepSeekV3 on Blackwell (#3272) 2025-02-04 04:12:09 +08:00
Yineng Zhang
00fa7d0417 add copyright for sgl-kernel (#3270) 2025-02-03 21:34:44 +08:00
Yineng Zhang
013021b6a1 refactor EAGLE 2 (#3269)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: merrymercy <lianminzheng@gmail.com>
Co-authored-by: Ying1123 <sqy1415@gmail.com>
2025-02-03 20:52:30 +08:00
Xiaoyu Zhang
3c8ac78dc1 optimize test_fused_moe style (#3268) 2025-02-03 18:56:18 +08:00
Liangjun Song
455bfe8dd3 Add a Doc about guide on nvidia jetson #3182 (#3205)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-02 20:29:10 -08:00
zifeitong
28b0a62bb3 Bug: Fix min_p sampling crash when using flashinfer backend (#3207)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-02 15:36:07 -08:00
HAI
566d61d90f ROCm: bump 6.3.0 (#3259) 2025-02-03 04:13:40 +08:00
Chayenne
55f5fc68ac Docs: Update accuracy evaluation (#3261) 2025-02-02 11:14:59 -08:00
simveit
c27c378a19 docs/accuracy evaluation (#3114)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-02-02 11:01:39 -08:00
Wen-Heng (Jack) Chung
d9eb9358cc Tune paged attention parameters for AMD GPU. (#3255) 2025-02-01 17:29:45 -08:00
Yineng Zhang
959dca4fc7 use srt VocabParallelEmbedding (#3252) 2025-02-01 22:23:09 +08:00
Yineng Zhang
f2b3a3188e Update README 2025-02-01 21:19:15 +08:00
Yineng Zhang
ad6740977b add contact us in README (#3251) 2025-02-01 19:47:44 +08:00
Yineng Zhang
8db776f049 support QuickGELU (#3250) 2025-02-01 19:31:47 +08:00
Yineng Zhang
4eb4b401cc update and simplify CustomOp (#3249) 2025-02-01 18:56:44 +08:00
HAI
17dbf976c5 update ENV to ROCm dockers (#3248) 2025-02-01 17:27:43 +08:00
Ke Bao
5317902670 Add test for fp8 torch compile (#3246) 2025-02-01 16:07:54 +08:00
Wenxuan Tan
d7c0b32f4d [Docs] Add more details to profiling docs (#3221) 2025-01-31 15:59:28 -08:00
Yineng Zhang
7b020cca2d add tuning block wise fp8 (#3242)
Co-authored-by: HandH1998 <007aabbcc411@gmail.com>
2025-02-01 03:58:18 +08:00
Yineng Zhang
7876279ea7 update cutlass dependency (#3240) 2025-02-01 03:13:44 +08:00
Yineng Zhang
34e405e01f update sgl-kernel version for sglang (#3238) 2025-02-01 02:14:41 +08:00
Ke Bao
1ebe1d6de5 Optimize MoE topk with torch compile (#3236) 2025-02-01 01:36:50 +08:00
Yineng Zhang
7811bfdaa7 compatible with flashinfer v0.2 (#3235) 2025-02-01 01:32:18 +08:00
Jhin
656f7fc1bc Docs: Quick fix for Speculative_decoding doc (#3228)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-01-31 08:30:40 -08:00
Yineng Zhang
cf0f7eafe6 chore: bump v0.4.2.post1 (#3233) 2025-01-31 20:35:55 +08:00
Yineng Zhang
b49d6d0fee support 12.5 CUDA runtime (#3231) 2025-01-31 20:31:38 +08:00
Ke Bao
c02e313914 Fix block wise fp8 torch compile (#3232) 2025-01-31 19:56:02 +08:00
Byron Hsu
734daedd8f [fix] Clamp logprob with dtype min to prevent -inf (#3224) 2025-01-31 17:04:04 +08:00
Yineng Zhang
3ee62235c6 revert the MoE dependence (#3230) 2025-01-31 16:51:41 +08:00
Ravi Theja
9829e77e3f Docs: Update supported models with Mistral 3 (#3229)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
2025-01-31 00:01:46 -08:00
Ying Sheng
cde4bbd5cc docs: add Novita for adoption and sponsorship (#3227) 2025-01-30 18:28:22 -08:00
Yineng Zhang
9602c2aac7 keep the parts needed for moe_kernels (#3218) 2025-01-31 00:39:47 +08:00
Yineng Zhang
e81d7f11de add tensorrt_llm moe_gemm as 3rdparty (#3217) 2025-01-30 23:49:14 +08:00
Yineng Zhang
222ce6f1da add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216)
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
2025-01-30 23:04:41 +08:00
Yineng Zhang
468d23cff9 update setup for sgl-kernel (#3214) 2025-01-30 19:47:50 +08:00
Yineng Zhang
c38b5fb4f4 update 3rdparty and rms norm for sgl-kernel (#3213) 2025-01-30 19:32:21 +08:00
Byron Hsu
20453cef62 [test] Lower number of top logprobs to get rid of -inf (#3212) 2025-01-30 18:01:23 +08:00
Mick
9f635ea50d [Fix] Address remaining issues of supporting MiniCPMV (#2977) 2025-01-28 00:22:13 -08:00
Fidel González
76285fdeea Fix typo in README (#3190) 2025-01-27 23:15:24 -08:00
Byron Hsu
988d0a4bfc [kernel] Use sgl_kernel rope (#3169)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-01-28 14:33:11 +08:00
Xiaoyu Zhang
81262c7b72 clean up useless file (#3192) 2025-01-28 14:29:30 +08:00
Byron Hsu
27aeb4b7d8 [test] deduplicate test_session_control (#3183) 2025-01-28 13:17:06 +08:00
Jhin
7b9b4f4426 Docs fix about EAGLE and streaming output (#3166)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jhin <jhinpan@umich.edu>
2025-01-27 18:10:45 -08:00
Zhiqiang Xie
08104b56de Sanity check to prevent performance regression (#3171)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-01-27 12:28:17 -08:00
Yineng Zhang
cf142b6eb8 fix: update Dockerfile for cu118 (#3181) 2025-01-27 23:46:44 +08:00
Yineng Zhang
4ab43cfb3e chore: bump v0.4.2 (#3180) 2025-01-27 21:42:05 +08:00
Yineng Zhang
2f79f58873 feat: use sgl-kernel 0.0.3 in sglang (#3179) 2025-01-27 21:39:52 +08:00
Yineng Zhang
8a96f74988 chore: bump 0.0.3 for sgl-kernel (#3178)
Co-authored-by: ispobock <ispobaoke@hotmail.com>
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
Co-authored-by: HandH1998 <007aabbcc411@gmail.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
2025-01-27 20:29:28 +08:00
Yineng Zhang
827aa8730b cleanup sgl-kernel kernels (#3175) 2025-01-27 19:11:01 +08:00