Yineng Zhang
|
4fe92bfca5
|
fix mla test (#3469)
|
2025-02-10 21:12:00 +08:00 |
|
Ying Sheng
|
d23cb9a01e
|
[Eagle] reduce one draft forward (#3468)
|
2025-02-10 20:21:49 +08:00 |
|
Ke Bao
|
2d61132374
|
Support Eagle2 for Triton backend (#3466)
|
2025-02-10 20:00:42 +08:00 |
|
Yineng Zhang
|
cddb1cdf8f
|
chore: bump v0.4.2.post4 (#3459)
|
2025-02-10 14:12:16 +08:00 |
|
Yineng Zhang
|
fa1b40e00d
|
use nvcr.io/nvidia/tritonserver:24.04-py3-min as base image (#3457)
|
2025-02-10 13:52:33 +08:00 |
|
Baizhou Zhang
|
c45cab1c00
|
[Fix] Fix accuracy bug and refactor codes for lora (#3413)
|
2025-02-10 13:29:00 +08:00 |
|
Yineng Zhang
|
27c4c9cf52
|
remove _grouped_size_compiled_for_decode_kernels (#3453)
|
2025-02-10 13:01:21 +08:00 |
|
Ying Sheng
|
52a492a16e
|
Update contribution_guide.md (#3452)
|
2025-02-10 12:53:47 +08:00 |
|
Yineng Zhang
|
36f6fc5093
|
feat: enable ragged fa3 by default on hopper 12.4+ (#3442)
|
2025-02-10 07:43:01 +08:00 |
|
Yineng Zhang
|
d87272750b
|
fix ci (#3441)
|
2025-02-10 04:22:28 +08:00 |
|
Yineng Zhang
|
6239d0b2e7
|
chore: bump sgl-kernel v0.0.3.post3 (#3440)
|
2025-02-10 04:00:52 +08:00 |
|
Yineng Zhang
|
4cfd3add6d
|
support version in sgl-kernel (#3439)
|
2025-02-10 03:49:52 +08:00 |
|
Shi Shuai
|
20cf910d8f
|
[docs] Update quantization documentation (#3437)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: jamessand <shazhizhou0@gmail.com>
|
2025-02-09 10:39:49 -08:00 |
|
Wenxuan Tan
|
0af1d239cb
|
[Docs] Add quantization docs (#3410)
Co-authored-by: yinfan98 <1106310035@qq.com>
|
2025-02-10 02:16:21 +08:00 |
|
Yineng Zhang
|
85986bb978
|
compatible with new outlines (#3435)
|
2025-02-10 01:51:30 +08:00 |
|
Yineng Zhang
|
64c8713573
|
remove activation dependency in fused_moe (#3433)
|
2025-02-10 01:18:57 +08:00 |
|
Yineng Zhang
|
1646149a83
|
fix draft cuda graph capture failure (#3431)
|
2025-02-09 23:16:20 +08:00 |
|
Yineng Zhang
|
bc72e5bd32
|
add cuda graph capture failure possible solution (#3430)
|
2025-02-09 22:57:11 +08:00 |
|
Yineng Zhang
|
014cab4dd2
|
update forward_return_lse (#3425)
|
2025-02-09 20:18:44 +08:00 |
|
Yineng Zhang
|
4d2dbeaca7
|
remove cutex dependency (#3422)
|
2025-02-09 18:33:20 +08:00 |
|
Yineng Zhang
|
29daf498cd
|
fix cu118 link issue (#3421)
|
2025-02-09 18:16:44 +08:00 |
|
Shi Shuai
|
6702592d0e
|
[docs] Add multi-node inference example for SLURM in documentation (#3408)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: aflah02 <aflah20082@iiitd.ac.in>
|
2025-02-08 21:45:14 -08:00 |
|
Yineng Zhang
|
60abdb3e7c
|
minor: cleanup test_eagle_infer (#3415)
|
2025-02-09 09:34:30 +08:00 |
|
Ying Sheng
|
7b4e61fff3
|
[Fix] Fix eagle with disable cuda graph (#3411)
|
2025-02-09 08:40:00 +08:00 |
|
Yineng Zhang
|
6222e1c228
|
add disable cuda graph unit test for eagle 2 (#3412)
|
2025-02-09 08:02:56 +08:00 |
|
Yineng Zhang
|
fad315cb8e
|
fix EAGLE 2 non greedy case (#3407)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-09 07:28:34 +08:00 |
|
Yineng Zhang
|
f90db8bc07
|
fix typo
|
2025-02-08 22:16:42 +08:00 |
|
Ke Bao
|
d8ad597048
|
Add deepseek-v3 a100 serving example (#3404)
|
2025-02-08 22:13:52 +08:00 |
|
GaoYuYang
|
849f58d617
|
Update fused_moe's benchmark (#3346)
|
2025-02-08 21:58:21 +08:00 |
|
yiakwy-xpu-ml-framework-team
|
64480df495
|
[BUG] fix moe benchmark when bs*seq is small (#3382)
|
2025-02-08 15:39:44 +08:00 |
|
lukec
|
4530136e61
|
Add H20 fp8 w8a8 gemm config (#3386)
|
2025-02-08 15:36:31 +08:00 |
|
Zachary Streeter
|
0a6f18f068
|
added amd_configure.md to references (#3275)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-07 08:50:49 -08:00 |
|
Yineng Zhang
|
c1f5f99f60
|
chore: bump v0.4.2.post3 (#3369)
|
2025-02-07 08:20:03 -08:00 |
|
Yineng Zhang
|
fa82dfccdd
|
fix EagleVerifyInput (#3378)
|
2025-02-07 22:30:43 +08:00 |
|
Yineng Zhang
|
5da3d21c8b
|
update pr-test ci (#3376)
|
2025-02-07 21:08:35 +08:00 |
|
Yineng Zhang
|
f287037673
|
update sgl-kernel version (#3374)
|
2025-02-07 20:51:06 +08:00 |
|
Yineng Zhang
|
f9905d59a8
|
support speculative decoding kernel in sgl-kernel (#3373)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2025-02-07 20:29:51 +08:00 |
|
Yineng Zhang
|
45c87e083f
|
fix undefined symbol cudaGetDriverEntryPointByVersion (#3372)
|
2025-02-07 19:32:45 +08:00 |
|
Yineng Zhang
|
2b1808cec4
|
update unit test in AMD CI (#3366)
|
2025-02-07 17:25:16 +08:00 |
|
lizamd
|
e868d0b60e
|
update waves_per_eu to 1 (#3356)
|
2025-02-07 13:08:06 +08:00 |
|
Shi Shuai
|
591e751e07
|
Fix: Runtime error for function calling (#3300)
|
2025-02-06 20:52:01 -08:00 |
|
Chayenne
|
40022d075a
|
Feature: Fix the binding error in Llama (#3355)
|
2025-02-06 20:19:24 -08:00 |
|
Liangjun Song
|
823148e7f0
|
Docs: Add deepseek usage and add multi-node, credit to lycanlancelot (#3314)
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-02-06 19:45:00 -08:00 |
|
Chayenne
|
76ca91dff2
|
Docs/CI: Enable Fake Finish for Docs Only PR (#3350)
|
2025-02-06 19:33:31 -08:00 |
|
Xiaoyu Zhang
|
cdae77b03d
|
optimize moe_align_kernel cuda (#3347)
|
2025-02-07 00:53:46 +08:00 |
|
Yineng Zhang
|
adeee15204
|
fix sgl-kernel build failure on AMD (#3352)
|
2025-02-07 00:35:59 +08:00 |
|
Ke Bao
|
6792411e7f
|
[Doc] Add optimization option guide for deepseek v3 (#3349)
|
2025-02-06 23:28:09 +08:00 |
|
Yineng Zhang
|
7348d9627e
|
add AMD guide for DeepSeek-R1 (#3338)
|
2025-02-06 16:54:40 +08:00 |
|
Yineng Zhang
|
25ed22b685
|
update pull request template (#3337)
|
2025-02-06 16:48:02 +08:00 |
|
saienduri
|
200d3b1608
|
Add sgl-kernel to MI300 CI paths tested. (#3335)
Co-authored-by: HAI <hixiao@gmail.com>
|
2025-02-06 00:45:38 -08:00 |
|