sglang

EngineX-Hygon/sglang

Fork 0

Commit Graph

Select branches

Hide Pull Requests

0.5.3rc0

v0.5.2

v0.5.2rc1

v0.5.3_dev

v0.5.4

v0.5.4_dev

v0.5.4_dev_liucong

v0.5.4_dev_maxiao

006ead9dcb [FA][Test] Fix Sparse FA test (#6306) Brayden Zhong 2025-05-26 04:27:48 -04:00
0d503090aa Supported precomputed feature for Kimi VL (#6599) Lifu Huang 2025-05-26 01:24:13 -07:00
501efc3d36 Tiny fix CI (#6611) fzyzcjy 2025-05-26 14:36:34 +08:00
f9bab3d591 qwen3moe support two batch overlap (#6598) Yi Zhang 2025-05-26 14:08:16 +08:00
16f69b1f65 feat: Improve Mistral and Qwen25 function call parsing (#6597) Chang Su 2025-05-25 23:07:23 -07:00
65f091310c refactor qwen moe code, use communicator to support tp+dp (#6581) Yi Zhang 2025-05-26 14:01:10 +08:00
fc419b62e8 Revert "Tiny fix lint CI does not trigger on master (#6609)" (#6610) Yineng Zhang 2025-05-25 22:52:34 -07:00
7eb9d8e594 chore: upgrade transformers 4.52.3 (#6575) Yineng Zhang 2025-05-25 22:49:58 -07:00
84147254c9 Tiny fix lint CI does not trigger on master (#6609) fzyzcjy 2025-05-26 13:47:03 +08:00
6bebef60a7 Support accurate length control for bench serving (#6594) fzyzcjy 2025-05-26 13:46:23 +08:00
25be63d0b2 Auto handle PD disaggregation in bench_serving (#6587) fzyzcjy 2025-05-26 13:41:27 +08:00
d502dae0f0 Tiny change killall_sglang.sh (#6596) fzyzcjy 2025-05-26 13:36:51 +08:00
93e53f6e0b Logging and minor fixes to two batch overlap and EPLB (#6595) fzyzcjy 2025-05-26 13:36:40 +08:00
a191a0e47c Improve performance of two batch overlap in some imbalanced cases (#6593) fzyzcjy 2025-05-26 13:36:18 +08:00
8c7279c24e Fix profiling will crash the server when using num_steps (#6586) fzyzcjy 2025-05-26 13:36:02 +08:00
0ca1811715 Support fake perfectly balanced EP dispatch algorithm (#6571) fzyzcjy 2025-05-26 13:35:51 +08:00
2c3a6fe1de Fix bench_serving does not support changing warmup requests (#6439) fzyzcjy 2025-05-26 13:35:36 +08:00
8b33d8df90 [PD] Fix prefill_servers in mini_lb (#6527) wangxiyu191 2025-05-26 10:38:41 +08:00
e235be16fe Fix some issues with current docs. (#6588) simveit 2025-05-25 19:04:34 +02:00
5ccf8fe1a0 Hint users when weight update timeouts (#6570) fzyzcjy 2025-05-26 00:13:17 +08:00
3f23d8cdf1 added support for tied weights in qwen pipeline parallelism (#6546) Shenggui Li 2025-05-25 15:00:56 +08:00
1a39979993 Sgl-router Prometheus metrics endpoint and usage track metrics (#6537) Chao Yang 2025-05-24 22:28:15 -07:00
022012aae8 Support Phi-4 Multi-Modal (text + vision only) (#6494) Lifu Huang 2025-05-24 21:43:38 -07:00
681e7af32b [OAI] Support non-normalized logprobs in OpenAI server (#5961) Chang Su 2025-05-24 21:35:55 -07:00
681fdc264b Refactor vlm embedding routine to use precomputed feature (#6543) Xinyuan Tong 2025-05-24 18:39:21 -07:00
0d47788025 Support overlapping two batches (#4068) fzyzcjy 2025-05-25 08:39:07 +08:00
f456037396 Utilize static dispatching for communicator (#6577) fzyzcjy 2025-05-25 08:34:35 +08:00
b2388433be Add back DeepSeek non-TBO branches (#6578) fzyzcjy 2025-05-25 08:34:00 +08:00
a38376fa99 Refactor attention into multiple stages (#6477) fzyzcjy 2025-05-25 08:33:25 +08:00
7a5e6ce1cb Fix GPU OOM (#6564) kk 2025-05-25 07:38:39 +08:00
24c035f2e3 Temporarily disable MI325x 8 gpu testing. (#6576) Sai Enduri 2025-05-24 16:37:22 -07:00
7e257cd666 chore: bump v0.4.6.post5 (#6566) Yineng Zhang 2025-05-24 00:48:05 -07:00
c4831e2fcf Fix accuracy is zero when enabling moe-dense-tp-size as in large scale EP (#6567) fzyzcjy 2025-05-24 15:27:10 +08:00
2e37fa07ba [FIX]remove ServerArgs duplicate code (#6485) Neo 2025-05-24 13:54:41 +08:00
2d831c6ef9 [PD] Support structured output (#6560) Byron Hsu 2025-05-23 21:49:00 -07:00
ed0c3035cd feat(Tool Calling): Support required and specific function mode (#6550) Chang Su 2025-05-23 21:00:37 -07:00
e6f113569e support eplb for qwen3 (#6533) Yi Zhang 2025-05-24 09:31:30 +08:00
7b02c32679 [Bugfix](gemma3_mm): handle flatten_batch constraint for multiple images (#6562) Chang Su 2025-05-23 18:11:54 -07:00
fefa19fec0 Update cmdline --enable-dp-attention help string for Qwen 2/3 Moe models. (#6524) miter 2025-05-24 06:20:21 +08:00
9c574585b3 fix: remove content=none test when tool called (#6347) Shi Shuai 2025-05-24 06:12:55 +08:00
8233cc10fd [PD] Support logprob & Add failure test (#6558) Byron Hsu 2025-05-23 14:29:20 -07:00
1b2e8f76d9 [2/2] Support Qserve (#6521) HandH1998 2025-05-24 03:39:18 +08:00
d2e0881a34 [PD] support spec decode (#6507) Byron Hsu 2025-05-23 12:03:05 -07:00
2f42749184 Fix topk inference performance reduce (#6474) Li Hui 2025-05-23 17:58:31 +08:00
d8189660a9 Update sgl-kernel UTs for activation/topk/norm/rope kernels (#6452) YanbingJiang 2025-05-23 17:03:15 +08:00
3ded6235c9 Add fp8 fused_experts kernel for CPU in sgl-kernel and add UT (#6404) Chunyuan WU 2025-05-23 17:01:55 +08:00
4ba1eea83f Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT (#6493) blzheng 2025-05-23 15:14:46 +08:00
4685fbb888 [VLM] Support chunk prefill for VLM (#6355) Chang Su 2025-05-22 20:32:41 -07:00
0a4fc73b48 [PD] Fix failure abort (#6535) Byron Hsu 2025-05-22 20:32:03 -07:00
a6970a17f3 misc: fix accept_length (#6536) Yineng Zhang 2025-05-22 14:27:10 -07:00
a6ae3af15e Support XiaomiMiMo inference with mtp (#6059) ryang 2025-05-23 05:14:49 +08:00
0b07c4a99f chore: upgrade sgl-kernel v0.1.4 (#6532) Yineng Zhang 2025-05-22 13:28:16 -07:00
fc0e3b9174 Support qwen3 deepep (#6120) lukec 2025-05-23 02:04:45 +08:00
d71f3f0a2a chore: bump sgl-kernel v0.1.4 (#6522) Yineng Zhang 2025-05-22 09:47:42 -07:00
58f10679e1 Fix missing http status import for PD failure handler (#6520) shangmingc 2025-05-22 15:23:54 +08:00
7a80f56513 Support dynamically rebalancing experts using EPLB (#6469) fzyzcjy 2025-05-22 14:13:21 +08:00
9484eba4ad Support logging expert balancedness metrics (#6482) fzyzcjy 2025-05-22 14:05:33 +08:00
e9feb48838 [RL] Remove the w13 weight_scale and input_scale for UnquantizedEPMoE… (#6308) Zilin Zhu 2025-05-22 13:03:15 +08:00
fc992a09f9 Support updating expert locations dynamically (#6388) fzyzcjy 2025-05-22 12:59:33 +08:00
121f92c583 Add main for merge state tests (#6492) Yuan Luo 2025-05-22 12:56:25 +08:00
3bde101099 [PD] Abort request if transfer fails (#6504) Byron Hsu 2025-05-21 21:44:25 -07:00
7513558074 [PD] Add doc and simplify sender.send (#6019) Byron Hsu 2025-05-21 21:22:21 -07:00
4d643f6c7a [1/2] Support Qserve (#6457) HandH1998 2025-05-22 10:48:59 +08:00
6ce0ed073b Apply constraint grammar to EAGLE (#6499) Ke Bao 2025-05-22 08:18:41 +08:00
969660c762 Recover from corrupted cache file in bench serving (#6510) fzyzcjy 2025-05-22 08:13:54 +08:00
16d4f6801b doc: Update README.md with adding deepwiki badge to enable weekly auto-refresh (#6508) Xinyuan Tong 2025-05-21 16:27:34 -07:00
ada268fd05 fix: EXAONE when using tie_word_embeddings (#5759) Kyungmin Lee 2025-05-22 03:30:04 +09:00
cfe48c5902 [CPU] Fix build issue (#6419) blzheng 2025-05-22 02:17:10 +08:00
d4c038daed [Fix]Fix capture fail bug for DeepSeek (#6275) Baizhou Zhang 2025-05-21 11:11:20 -07:00
55f6005f53 Fix bench_one_batch_server (#6503) fzyzcjy 2025-05-22 02:08:17 +08:00
7222e1dacc Let bench_one_batch_server use sharegpt data to make expert distribution more natural (#5573) fzyzcjy 2025-05-21 17:08:43 +08:00
505eec4dc9 Tiny make Lint CI show diff (#6445) fzyzcjy 2025-05-21 17:06:25 +08:00
ccfe5c009d Support redundant experts in expert parallel (#6461) fzyzcjy 2025-05-21 17:05:53 +08:00
a071dc4084 Tiny add stage assertions to DeepEPDispatcher to avoid misuse (#6467) fzyzcjy 2025-05-21 17:05:05 +08:00
a40aecc5a3 Fix num_qps_per_rank computation when providing custom DeepEP configuration (#6468) fzyzcjy 2025-05-21 17:04:33 +08:00
d6e1d28c8a Refactor DeepSeek attention dispatching (#6476) fzyzcjy 2025-05-21 17:03:39 +08:00
7c347259ff [RL] allow weight updation with dp attention enabled (#6311) Zilin Zhu 2025-05-21 16:58:55 +08:00
669caa0a3f [router] support http2 in router (#6487) Zilin Zhu 2025-05-21 16:42:45 +08:00
4024e1d2a8 Implement Siglip Vision model, and support BNB quantization for gemma3-mm (#5339) Jiajun Li 2025-05-20 23:53:46 -07:00
5c0b38f369 aiter attention-backend (default enabled on AMD/ROCm) (#6381) HAI 2025-05-20 22:52:41 -07:00
30ca18f423 Refactor group_concurrent_contiguous in NIXL (#6214) Yuan Luo 2025-05-21 11:55:04 +08:00
03886917bd Disable all two stream overlap on amd (#6475) Lianmin Zheng 2025-05-20 19:06:59 -07:00
66324895c6 [docs] Fix torch version (#6472) Wenxuan Tan 2025-05-20 12:53:14 -05:00
13feffd082 Fix master CI for DeepSeek (#6447) fzyzcjy 2025-05-20 15:31:42 +08:00
e98afbe042 Support dispatching logical to physical experts (#6385) fzyzcjy 2025-05-20 13:13:55 +08:00
69af3ec35f [doc] add note for get_num_kv_splits in triton_backend (#6444) JieXin Liang 2025-05-20 12:40:21 +08:00
32cc66efa5 Update extend/decode attention kernel for CPU in sgl-kernel and add UTs (#6405) YanbingJiang 2025-05-20 12:23:17 +08:00
83f2d9d4ed [QuickFix] fix gptq model initialize (#6429) PGFLMG 2025-05-20 12:17:10 +08:00
6317c5c61f Address performance regression: disable multiple streams on ROCm (#6412) HAI 2025-05-19 21:16:20 -07:00
cba1cdbc46 Support DeepSeek EPLB algorithm with static distributions (#6387) fzyzcjy 2025-05-20 12:06:21 +08:00
c471d39eb9 Support loading weights when physical experts are different from logical experts (#6386) fzyzcjy 2025-05-20 12:05:53 +08:00
d0443275f0 Refactor DeepSeek logic into atomic operations (#6326) fzyzcjy 2025-05-20 12:05:30 +08:00
17d080b7ae Remove Cargo.lock, add it into .gitignore (#6438) Liangsheng Yin 2025-05-20 12:01:32 +08:00
1b19df4b2a Refactor communication logic of DeepSeek for extensibility and understandability (#6321) fzyzcjy 2025-05-20 11:14:48 +08:00
f0653886a5 Expert distribution recording without overhead for EPLB (#4957) fzyzcjy 2025-05-20 11:07:43 +08:00
b146555749 Revert "Implement return_hidden_states for the OpenAI API (#6137)" (#6440) Yineng Zhang 2025-05-19 18:21:29 -07:00
b06215daed [BUG] fix stop_profile crash (#6431) Yi Zhang 2025-05-20 08:30:33 +08:00
7adf245ba2 [Metrics] Add KV events publishing (#6098) Trevor Morris 2025-05-19 14:19:54 -07:00
299fd22f9e Fix throughput threshold for amd ci test (#6414) Baizhou Zhang 2025-05-19 14:17:41 -07:00
506e5de8fe Improve supported models doc (#6430) simveit 2025-05-19 19:43:35 +02:00

Commit Graph Select branches Hide Pull Requests 0.5.3rc0 v0.5.2 v0.5.2rc1 v0.5.3_dev v0.5.4 v0.5.4_dev v0.5.4_dev_liucong v0.5.4_dev_maxiao Mono Color

Commit Graph

Select branches

Hide Pull Requests

0.5.3rc0

v0.5.2

v0.5.2rc1

v0.5.3_dev

v0.5.4

v0.5.4_dev

v0.5.4_dev_liucong

v0.5.4_dev_maxiao