sglang

EngineX-Hygon/sglang

Fork 0

Commit Graph

Select branches

Hide Pull Requests

0.5.3rc0

v0.5.2

v0.5.2rc1

v0.5.3_dev

v0.5.4

v0.5.4_dev

v0.5.4_dev_liucong

v0.5.4_dev_maxiao

2f47d710ae refine some typo (#3473) Xiaoyu Zhang 2025-02-10 23:35:44 +08:00
4fe92bfca5 fix mla test (#3469) Yineng Zhang 2025-02-10 21:12:00 +08:00
d23cb9a01e [Eagle] reduce one draft forward (#3468) Ying Sheng 2025-02-10 04:21:49 -08:00
2d61132374 Support Eagle2 for Triton backend (#3466) Ke Bao 2025-02-10 20:00:42 +08:00
cddb1cdf8f chore: bump v0.4.2.post4 (#3459) Yineng Zhang 2025-02-10 14:12:16 +08:00
fa1b40e00d use nvcr.io/nvidia/tritonserver:24.04-py3-min as base image (#3457) Yineng Zhang 2025-02-10 13:52:33 +08:00
c45cab1c00 [Fix] Fix accuracy bug and refactor codes for lora (#3413) Baizhou Zhang 2025-02-09 21:29:00 -08:00
27c4c9cf52 remove _grouped_size_compiled_for_decode_kernels (#3453) Yineng Zhang 2025-02-10 13:01:21 +08:00
52a492a16e Update contribution_guide.md (#3452) Ying Sheng 2025-02-09 20:53:47 -08:00
36f6fc5093 feat: enable ragged fa3 by default on hopper 12.4+ (#3442) Yineng Zhang 2025-02-10 07:43:01 +08:00
d87272750b fix ci (#3441) Yineng Zhang 2025-02-10 04:22:28 +08:00
6239d0b2e7 chore: bump sgl-kernel v0.0.3.post3 (#3440) Yineng Zhang 2025-02-10 04:00:52 +08:00
4cfd3add6d support version in sgl-kernel (#3439) Yineng Zhang 2025-02-10 03:49:52 +08:00
20cf910d8f [docs] Update quantization documentation (#3437) Shi Shuai 2025-02-09 18:39:49 +00:00
0af1d239cb [Docs] Add quantization docs (#3410) Wenxuan Tan 2025-02-09 12:16:21 -06:00
85986bb978 compatible with new outlines (#3435) Yineng Zhang 2025-02-10 01:51:30 +08:00
64c8713573 remove activation dependency in fused_moe (#3433) Yineng Zhang 2025-02-10 01:18:57 +08:00
1646149a83 fix draft cuda graph capture failure (#3431) Yineng Zhang 2025-02-09 23:16:20 +08:00
bc72e5bd32 add cuda graph capture failure possible solution (#3430) Yineng Zhang 2025-02-09 22:57:11 +08:00
014cab4dd2 update forward_return_lse (#3425) Yineng Zhang 2025-02-09 20:18:44 +08:00
4d2dbeaca7 remove cutex dependency (#3422) Yineng Zhang 2025-02-09 18:33:20 +08:00
29daf498cd fix cu118 link issue (#3421) Yineng Zhang 2025-02-09 18:16:44 +08:00
6702592d0e [docs] Add multi-node inference example for SLURM in documentation (#3408) Shi Shuai 2025-02-09 05:45:14 +00:00
60abdb3e7c minor: cleanup test_eagle_infer (#3415) Yineng Zhang 2025-02-09 09:34:30 +08:00
7b4e61fff3 [Fix] Fix eagle with disable cuda graph (#3411) Ying Sheng 2025-02-08 16:40:00 -08:00
6222e1c228 add disable cuda graph unit test for eagle 2 (#3412) Yineng Zhang 2025-02-09 08:02:56 +08:00
fad315cb8e fix EAGLE 2 non greedy case (#3407) Yineng Zhang 2025-02-09 07:28:34 +08:00
f90db8bc07 fix typo Yineng Zhang 2025-02-08 22:16:42 +08:00
d8ad597048 Add deepseek-v3 a100 serving example (#3404) Ke Bao 2025-02-08 22:13:52 +08:00
849f58d617 Update fused_moe's benchmark (#3346) GaoYuYang 2025-02-08 21:58:21 +08:00
64480df495 [BUG] fix moe benchmark when bs*seq is small (#3382) yiakwy-xpu-ml-framework-team 2025-02-08 15:39:44 +08:00
4530136e61 Add H20 fp8 w8a8 gemm config (#3386) lukec 2025-02-08 15:36:31 +08:00
0a6f18f068 added amd_configure.md to references (#3275) Zachary Streeter 2025-02-07 10:50:49 -06:00
c1f5f99f60 chore: bump v0.4.2.post3 (#3369) Yineng Zhang 2025-02-08 00:20:03 +08:00
fa82dfccdd fix EagleVerifyInput (#3378) Yineng Zhang 2025-02-07 22:30:43 +08:00
5da3d21c8b update pr-test ci (#3376) Yineng Zhang 2025-02-07 21:08:35 +08:00
f287037673 update sgl-kernel version (#3374) Yineng Zhang 2025-02-07 20:51:06 +08:00
f9905d59a8 support speculative decoding kernel in sgl-kernel (#3373) Yineng Zhang 2025-02-07 20:29:51 +08:00
45c87e083f fix undefined symbol cudaGetDriverEntryPointByVersion (#3372) Yineng Zhang 2025-02-07 19:32:45 +08:00
2b1808cec4 update unit test in AMD CI (#3366) Yineng Zhang 2025-02-07 17:25:16 +08:00
e868d0b60e update waves_per_eu to 1 (#3356) lizamd 2025-02-06 21:08:06 -08:00
591e751e07 Fix: Runtime error for function calling (#3300) Shi Shuai 2025-02-07 04:52:01 +00:00
40022d075a Feature: Fix the binding error in Llama (#3355) Chayenne 2025-02-06 20:19:24 -08:00
823148e7f0 Docs: Add deepseek usage and add multi-node, credit to lycanlancelot (#3314) Liangjun Song 2025-02-07 14:45:00 +11:00
76ca91dff2 Docs/CI: Enable Fake Finish for Docs Only PR (#3350) Chayenne 2025-02-06 19:33:31 -08:00
cdae77b03d optimize moe_align_kernel cuda (#3347) Xiaoyu Zhang 2025-02-07 00:53:46 +08:00
adeee15204 fix sgl-kernel build failure on AMD (#3352) Yineng Zhang 2025-02-07 00:35:59 +08:00
6792411e7f [Doc] Add optimization option guide for deepseek v3 (#3349) Ke Bao 2025-02-06 23:28:09 +08:00
7348d9627e add AMD guide for DeepSeek-R1 (#3338) Yineng Zhang 2025-02-06 16:54:40 +08:00
25ed22b685 update pull request template (#3337) Yineng Zhang 2025-02-06 16:48:02 +08:00
200d3b1608 Add sgl-kernel to MI300 CI paths tested. (#3335) saienduri 2025-02-06 00:45:38 -08:00
ad3499858e clean moe align block kernel code and add acc test (#3332) Xiaoyu Zhang 2025-02-06 16:42:36 +08:00
32de54ed1a [ROCm] Fix fp8 unrolledx4 matmul kernel. (#3325) Wen-Heng (Jack) Chung 2025-02-05 20:44:15 -06:00
2d9c319594 Docker switch (#3327) saienduri 2025-02-05 18:06:50 -08:00
07e58a2dcb update README (#3324) Yineng Zhang 2025-02-06 07:13:05 +08:00
04d8cd2088 Initial Enablement of CI on MI300 (#3168) saienduri 2025-02-05 10:45:12 -08:00
a322051e31 Support custom mask for Triton attention (#3317) Ke Bao 2025-02-06 01:16:02 +08:00
de5533341e Update Triton extend backend interface (#3309) Ke Bao 2025-02-05 18:12:22 +08:00
7aad8d1854 chore: bump v0.4.2.post2 (#3313) Yineng Zhang 2025-02-05 17:35:02 +08:00
76fa2d152c Fix lora flashinfer import bug on ROCM (#3312) Baizhou Zhang 2025-02-05 00:36:49 -08:00
7ab84948d8 [ROCm] Logic to decide whether to used manually unrolled kernel. (#3306) Wen-Heng (Jack) Chung 2025-02-04 21:12:20 -06:00
4885b90802 Use forward_cuda to execute custom op for hip platform (#3305) kk 2025-02-05 10:58:17 +08:00
c2723a42a5 [ROCm] Manually unroll _w8a8_block_fp8_matmul kernel on AMD GPU. (#3299) Wen-Heng (Jack) Chung 2025-02-04 17:15:40 -06:00
c7256ca836 [ROCm] Add tuning configs for AMD Radeon Graphics. (#3294) Wen-Heng (Jack) Chung 2025-02-04 12:34:57 -06:00
6186a8f889 update flashinfer install index url (#3293) Yineng Zhang 2025-02-05 00:44:35 +08:00
a07364ccc5 Update Triton decode backend interface (#3292) Ke Bao 2025-02-04 23:26:04 +08:00
2c1a695ff1 ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287) HAI 2025-02-04 05:44:44 -08:00
d39899e85c upgrade flashinfer v0.2.0.post2 (#3288) Yineng Zhang 2025-02-04 21:41:40 +08:00
70817a7eae [Feature] Define backends and add Triton backend for Lora (#3161) Baizhou Zhang 2025-02-03 22:09:13 -08:00
7b5a374114 Update server args doc (#3273) simveit 2025-02-04 00:39:41 +01:00
4b6f62e2bc add Atlas Cloud for Adoption and Sponsorship (#3276) Yineng Zhang 2025-02-04 05:31:30 +08:00
897e2e253a add Nebius for Adoption and Sponsorship (#3274) Yineng Zhang 2025-02-04 04:41:26 +08:00
d54cee1441 adding Triton configs for DeepSeekV3 on Blackwell (#3272) kushanam 2025-02-03 12:12:09 -08:00
00fa7d0417 add copyright for sgl-kernel (#3270) Yineng Zhang 2025-02-03 21:34:44 +08:00
013021b6a1 refactor EAGLE 2 (#3269) Yineng Zhang 2025-02-03 20:52:30 +08:00
3c8ac78dc1 optimize test_fused_moe style (#3268) Xiaoyu Zhang 2025-02-03 18:56:18 +08:00
455bfe8dd3 Add a Doc about guide on nvidia jetson #3182 (#3205) Liangjun Song 2025-02-03 15:29:10 +11:00
28b0a62bb3 Bug: Fix min_p sampling crash when using flashinfer backend (#3207) zifeitong 2025-02-02 15:36:07 -08:00
566d61d90f ROCm: bump 6.3.0 (#3259) HAI 2025-02-02 12:13:40 -08:00
55f5fc68ac Docs: Update accuracy evaluation (#3261) Chayenne 2025-02-02 11:14:59 -08:00
c27c378a19 docs/accuracy evaluation (#3114) simveit 2025-02-02 20:01:39 +01:00
d9eb9358cc Tune paged attention parameters for AMD GPU. (#3255) Wen-Heng (Jack) Chung 2025-02-01 19:29:45 -06:00
959dca4fc7 use srt VocabParallelEmbedding (#3252) Yineng Zhang 2025-02-01 22:23:09 +08:00
f2b3a3188e Update README Yineng Zhang 2025-02-01 21:19:15 +08:00
ad6740977b add contact us in README (#3251) Yineng Zhang 2025-02-01 19:47:44 +08:00
8db776f049 support QuickGELU (#3250) Yineng Zhang 2025-02-01 19:31:47 +08:00
4eb4b401cc update and simplify CustomOp (#3249) Yineng Zhang 2025-02-01 18:56:44 +08:00
17dbf976c5 update ENV to ROCm dockers (#3248) HAI 2025-02-01 01:27:43 -08:00
5317902670 Add test for fp8 torch compile (#3246) Ke Bao 2025-02-01 16:07:54 +08:00
d7c0b32f4d [Docs] Add more details to profiling docs (#3221) Wenxuan Tan 2025-01-31 17:59:28 -06:00
7b020cca2d add tuning block wise fp8 (#3242) Yineng Zhang 2025-02-01 03:58:18 +08:00
7876279ea7 update cutlass dependency (#3240) Yineng Zhang 2025-02-01 03:13:44 +08:00
34e405e01f update sgl-kernel version for sglang (#3238) Yineng Zhang 2025-02-01 02:14:41 +08:00
1ebe1d6de5 Optimize MoE topk with torch compile (#3236) Ke Bao 2025-02-01 01:36:50 +08:00
7811bfdaa7 compatible with flashinfer v0.2 (#3235) Yineng Zhang 2025-02-01 01:32:18 +08:00
656f7fc1bc Docs: Quick fix for Speculative_decoding doc (#3228) Jhin 2025-01-31 10:30:40 -06:00
cf0f7eafe6 chore: bump v0.4.2.post1 (#3233) Yineng Zhang 2025-01-31 20:35:55 +08:00
b49d6d0fee support 12.5 CUDA runtime (#3231) Yineng Zhang 2025-01-31 20:31:38 +08:00
c02e313914 Fix block wise fp8 torch compile (#3232) Ke Bao 2025-01-31 19:56:02 +08:00
734daedd8f [fix] Clamp logprob with dtype min to prevent -inf (#3224) Byron Hsu 2025-01-31 01:04:04 -08:00

Commit Graph Select branches Hide Pull Requests 0.5.3rc0 v0.5.2 v0.5.2rc1 v0.5.3_dev v0.5.4 v0.5.4_dev v0.5.4_dev_liucong v0.5.4_dev_maxiao Mono Color

Commit Graph

Select branches

Hide Pull Requests

0.5.3rc0

v0.5.2

v0.5.2rc1

v0.5.3_dev

v0.5.4

v0.5.4_dev

v0.5.4_dev_liucong

v0.5.4_dev_maxiao