Yineng Zhang
|
897e2e253a
|
add Nebius for Adoption and Sponsorship (#3274)
|
2025-02-04 04:41:26 +08:00 |
|
kushanam
|
d54cee1441
|
adding Triton configs for DeepSeekV3 on Blackwell (#3272)
|
2025-02-04 04:12:09 +08:00 |
|
Yineng Zhang
|
00fa7d0417
|
add copyright for sgl-kernel (#3270)
|
2025-02-03 21:34:44 +08:00 |
|
Yineng Zhang
|
013021b6a1
|
refactor EAGLE 2 (#3269)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: merrymercy <lianminzheng@gmail.com>
Co-authored-by: Ying1123 <sqy1415@gmail.com>
|
2025-02-03 20:52:30 +08:00 |
|
Xiaoyu Zhang
|
3c8ac78dc1
|
optimize test_fused_moe style (#3268)
|
2025-02-03 18:56:18 +08:00 |
|
Liangjun Song
|
455bfe8dd3
|
Add a Doc about guide on nvidia jetson #3182 (#3205)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-02 20:29:10 -08:00 |
|
zifeitong
|
28b0a62bb3
|
Bug: Fix min_p sampling crash when using flashinfer backend (#3207)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-02 15:36:07 -08:00 |
|
HAI
|
566d61d90f
|
ROCm: bump 6.3.0 (#3259)
|
2025-02-03 04:13:40 +08:00 |
|
Chayenne
|
55f5fc68ac
|
Docs: Update accuracy evaluation (#3261)
|
2025-02-02 11:14:59 -08:00 |
|
simveit
|
c27c378a19
|
docs/accuracy evaluation (#3114)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-02 11:01:39 -08:00 |
|
Wen-Heng (Jack) Chung
|
d9eb9358cc
|
Tune paged attention parameters for AMD GPU. (#3255)
|
2025-02-01 17:29:45 -08:00 |
|
Yineng Zhang
|
959dca4fc7
|
use srt VocabParallelEmbedding (#3252)
|
2025-02-01 22:23:09 +08:00 |
|
Yineng Zhang
|
f2b3a3188e
|
Update README
|
2025-02-01 21:19:15 +08:00 |
|
Yineng Zhang
|
ad6740977b
|
add contact us in README (#3251)
|
2025-02-01 19:47:44 +08:00 |
|
Yineng Zhang
|
8db776f049
|
support QuickGELU (#3250)
|
2025-02-01 19:31:47 +08:00 |
|
Yineng Zhang
|
4eb4b401cc
|
update and simplify CustomOp (#3249)
|
2025-02-01 18:56:44 +08:00 |
|
HAI
|
17dbf976c5
|
update ENV to ROCm dockers (#3248)
|
2025-02-01 17:27:43 +08:00 |
|
Ke Bao
|
5317902670
|
Add test for fp8 torch compile (#3246)
|
2025-02-01 16:07:54 +08:00 |
|
Wenxuan Tan
|
d7c0b32f4d
|
[Docs] Add more details to profiling docs (#3221)
|
2025-01-31 15:59:28 -08:00 |
|
Yineng Zhang
|
7b020cca2d
|
add tuning block wise fp8 (#3242)
Co-authored-by: HandH1998 <007aabbcc411@gmail.com>
|
2025-02-01 03:58:18 +08:00 |
|
Yineng Zhang
|
7876279ea7
|
update cutlass dependency (#3240)
|
2025-02-01 03:13:44 +08:00 |
|
Yineng Zhang
|
34e405e01f
|
update sgl-kernel version for sglang (#3238)
|
2025-02-01 02:14:41 +08:00 |
|
Ke Bao
|
1ebe1d6de5
|
Optimize MoE topk with torch compile (#3236)
|
2025-02-01 01:36:50 +08:00 |
|
Yineng Zhang
|
7811bfdaa7
|
compatible with flashinfer v0.2 (#3235)
|
2025-02-01 01:32:18 +08:00 |
|
Jhin
|
656f7fc1bc
|
Docs: Quick fix for Speculative_decoding doc (#3228)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-01-31 08:30:40 -08:00 |
|
Yineng Zhang
|
cf0f7eafe6
|
chore: bump v0.4.2.post1 (#3233)
|
2025-01-31 20:35:55 +08:00 |
|
Yineng Zhang
|
b49d6d0fee
|
support 12.5 CUDA runtime (#3231)
|
2025-01-31 20:31:38 +08:00 |
|
Ke Bao
|
c02e313914
|
Fix block wise fp8 torch compile (#3232)
|
2025-01-31 19:56:02 +08:00 |
|
Byron Hsu
|
734daedd8f
|
[fix] Clamp logprob with dtype min to prevent -inf (#3224)
|
2025-01-31 17:04:04 +08:00 |
|
Yineng Zhang
|
3ee62235c6
|
revert the MoE dependence (#3230)
|
2025-01-31 16:51:41 +08:00 |
|
Ravi Theja
|
9829e77e3f
|
Docs: Update supported models with Mistral 3 (#3229)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
|
2025-01-31 00:01:46 -08:00 |
|
Ying Sheng
|
cde4bbd5cc
|
docs: add Novita for adoption and sponsorship (#3227)
|
2025-01-30 18:28:22 -08:00 |
|
Yineng Zhang
|
9602c2aac7
|
keep the parts needed for moe_kernels (#3218)
|
2025-01-31 00:39:47 +08:00 |
|
Yineng Zhang
|
e81d7f11de
|
add tensorrt_llm moe_gemm as 3rdparty (#3217)
|
2025-01-30 23:49:14 +08:00 |
|
Yineng Zhang
|
222ce6f1da
|
add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216)
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
|
2025-01-30 23:04:41 +08:00 |
|
Yineng Zhang
|
468d23cff9
|
update setup for sgl-kernel (#3214)
|
2025-01-30 19:47:50 +08:00 |
|
Yineng Zhang
|
c38b5fb4f4
|
update 3rdparty and rms norm for sgl-kernel (#3213)
|
2025-01-30 19:32:21 +08:00 |
|
Byron Hsu
|
20453cef62
|
[test] Lower number of top logprobs to get rid of -inf (#3212)
|
2025-01-30 18:01:23 +08:00 |
|
Mick
|
9f635ea50d
|
[Fix] Address remaining issues of supporting MiniCPMV (#2977)
|
2025-01-28 00:22:13 -08:00 |
|
Fidel González
|
76285fdeea
|
Fix typo in README (#3190)
|
2025-01-27 23:15:24 -08:00 |
|
Byron Hsu
|
988d0a4bfc
|
[kernel] Use sgl_kernel rope (#3169)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-01-28 14:33:11 +08:00 |
|
Xiaoyu Zhang
|
81262c7b72
|
clean up useless file (#3192)
|
2025-01-28 14:29:30 +08:00 |
|
Byron Hsu
|
27aeb4b7d8
|
[test] deduplicate test_session_control (#3183)
|
2025-01-28 13:17:06 +08:00 |
|
Jhin
|
7b9b4f4426
|
Docs fix about EAGLE and streaming output (#3166)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jhin <jhinpan@umich.edu>
|
2025-01-27 18:10:45 -08:00 |
|
Zhiqiang Xie
|
08104b56de
|
Sanity check to prevent performance regression (#3171)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-01-27 12:28:17 -08:00 |
|
Yineng Zhang
|
cf142b6eb8
|
fix: update Dockerfile for cu118 (#3181)
|
2025-01-27 23:46:44 +08:00 |
|
Yineng Zhang
|
4ab43cfb3e
|
chore: bump v0.4.2 (#3180)
|
2025-01-27 21:42:05 +08:00 |
|
Yineng Zhang
|
2f79f58873
|
feat: use sgl-kernel 0.0.3 in sglang (#3179)
|
2025-01-27 21:39:52 +08:00 |
|
Yineng Zhang
|
8a96f74988
|
chore: bump 0.0.3 for sgl-kernel (#3178)
Co-authored-by: ispobock <ispobaoke@hotmail.com>
Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>
Co-authored-by: HandH1998 <007aabbcc411@gmail.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
|
2025-01-27 20:29:28 +08:00 |
|
Yineng Zhang
|
827aa8730b
|
cleanup sgl-kernel kernels (#3175)
|
2025-01-27 19:11:01 +08:00 |
|