Rui Chen
|
c33499a67b
|
fix: sgl-router remove dead code (#8257)
|
2025-07-22 08:41:23 -07:00 |
|
Hubert Lu
|
e50109f2ed
|
[AMD] Remove vllm's scaled_fp8_quant and moe_sum when SGLANG_USE_AITER=1 (#7484)
|
2025-07-21 17:33:19 -07:00 |
|
Xinyuan Tong
|
69adc4f81c
|
fix: retrieve mm token by modality, raise error if none (#8221)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-07-22 08:06:35 +08:00 |
|
Yineng Zhang
|
114837854f
|
docs: update 2025 h2 roadmap (#8237)
|
2025-07-21 14:02:48 -07:00 |
|
Xiaoze Fan
|
7b68d27111
|
[Feature] Add a test for Layer-wise Prefill (#8231)
Signed-off-by: jason-fxz <jason341132@qq.com>
|
2025-07-21 22:06:15 +08:00 |
|
Yineng Zhang
|
74f59ae555
|
chore: upgrade sgl-kernel 0.2.6.post1 (#8202)
|
2025-07-21 02:10:24 -07:00 |
|
Ke Bao
|
6936be3221
|
Remve router gemm output dtype conversion (#8204)
|
2025-07-21 15:37:00 +08:00 |
|
Simo Lin
|
9b5de6cb06
|
[router] upgade router version to 0.1.6 (#8209)
|
2025-07-20 23:13:20 -07:00 |
|
Simo Lin
|
5c8365a051
|
[router] add ut for pd router (#8208)
|
2025-07-20 23:12:52 -07:00 |
|
Xinyuan Tong
|
8430bfe3e9
|
[Refactor] simplify multimodal data processing (#8107)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-07-20 21:43:09 -07:00 |
|
Ke Bao
|
c9e8613c97
|
Apply fused sorted token ids padding (#8193)
|
2025-07-21 11:19:48 +08:00 |
|
Yineng Zhang
|
429bb0efa2
|
chore: bump sgl-kernel v0.2.6.post1 (#8200)
|
2025-07-20 19:50:28 -07:00 |
|
JieXin Liang
|
7eebd44047
|
[fix] fix modelopt fp4 on b200 (#8195)
|
2025-07-20 17:39:57 -07:00 |
|
ronnie_zheng
|
93d124ef5a
|
[feature] enable NPU CI (#7935)
Co-authored-by: Even Zhou <14368888+iforgetmyname@users.noreply.github.com>
|
2025-07-20 13:12:42 -07:00 |
|
Simo Lin
|
1fc455e8b6
|
[router] add ut for pd request, metrics and config (#8184)
|
2025-07-20 10:53:42 -07:00 |
|
Ke Bao
|
465968b2e3
|
Fix dtype error in CI (#8197)
|
2025-07-21 00:27:55 +08:00 |
|
GuoYipin
|
750838adc4
|
fix: fix the bug of loading Internvl3 (#8067)
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-07-20 22:22:54 +08:00 |
|
Jay Zhou
|
99aefa037e
|
Fix eagle3 cuda graph (#8163)
|
2025-07-20 15:28:06 +08:00 |
|
Qiaolin Yu
|
bbcfbc1a02
|
feat: add h200 tp 16 kimi k2 moe config (#8183)
|
2025-07-19 23:30:08 -07:00 |
|
Praneth Paruchuri
|
83c104b188
|
Feat: Support for Persimmon Model (#7983)
|
2025-07-19 23:07:47 -07:00 |
|
Yineng Zhang
|
2db6719cc5
|
feat: update nccl 2.27.6 (#8182)
|
2025-07-19 22:55:45 -07:00 |
|
Lianmin Zheng
|
55381a46ac
|
Revert "[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability" (#8181)
|
2025-07-19 22:41:30 -07:00 |
|
Atream
|
a589a07167
|
fix moe gate dtype, fix tbo, fix fake dispatch (#7825)
|
2025-07-19 22:13:46 -07:00 |
|
Yineng Zhang
|
f62d75b6a1
|
feat: add b200 tp 16 kimi k2 moe config (#8178)
|
2025-07-19 20:04:12 -07:00 |
|
Yineng Zhang
|
0f9b11e310
|
feat: add h200 tp 16 kimi k2 moe config (#8176)
|
2025-07-19 20:04:02 -07:00 |
|
Pavel Logachev
|
877e35d775
|
Add get_hidden_dim to qwen3.py for correct lora (#7312)
|
2025-07-19 19:31:16 -07:00 |
|
Clay
|
cbdfb77123
|
Enable FlashInfer support encoder models and add head_dim padding workaround (#6230)
|
2025-07-19 19:30:16 -07:00 |
|
Baizhou Zhang
|
282eb59ff3
|
Add bf16 output option for dsv3_router_gemm kernel (#7999)
|
2025-07-20 09:49:37 +08:00 |
|
ybyang
|
4540a4666a
|
[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability (#8115)
Signed-off-by: ybyang <ybyang7@iflytek.com>
|
2025-07-19 18:10:00 -07:00 |
|
Cheng Wan
|
abda2542d5
|
Fix tuning_fused_moe_triton.py (#8175)
|
2025-07-19 17:33:50 -07:00 |
|
Baizhou Zhang
|
8cddfa56a1
|
Clean warning logs for gate_proj loading in Lora (#8172)
|
2025-07-19 15:56:50 -07:00 |
|
Lifu Huang
|
4e3defe5a7
|
Support start up LoRA server without initial adapters (#8019)
|
2025-07-19 15:38:09 -07:00 |
|
Garry Fang
|
60468da4e2
|
bugfix: fix sglang crash in NVIDIA MIG container (#8167)
Signed-off-by: Garrybest <garrybest@foxmail.com>
|
2025-07-19 14:41:27 -07:00 |
|
Simo Lin
|
41d33e4736
|
[router] add ut for worker and errors (#8170)
|
2025-07-19 14:38:33 -07:00 |
|
kyleliang-nv
|
bfdd226f35
|
Fix Dockerfile.gb200 (#8169)
|
2025-07-19 14:37:53 -07:00 |
|
Lifu Huang
|
3de617a75b
|
Fix LoRA buffer contamination during adapter eviction (#8103)
|
2025-07-19 13:14:08 -07:00 |
|
Lianmin Zheng
|
bb0e8a32b5
|
Clean up server args (#8161)
|
2025-07-19 11:32:52 -07:00 |
|
Lianmin Zheng
|
1b427dae02
|
Update README.md (#8171)
|
2025-07-19 11:04:19 -07:00 |
|
Charles Chen
|
f3d9736156
|
Fix suffix mismatch for the metrics. (#8168)
Signed-off-by: Charles Chen <chenliqian@chenliqian.cn>
|
2025-07-19 10:11:24 -07:00 |
|
Yineng Zhang
|
561dd7b2ce
|
chore: upgrade sgl-kernel 0.2.6 (#8166)
|
2025-07-19 03:17:08 -07:00 |
|
Yineng Zhang
|
f98e88b9fb
|
chore: bump sgl-kernel v0.2.6 (#8165)
|
2025-07-19 00:56:18 -07:00 |
|
Cheng Wan
|
15ad6c9086
|
[1/N] MoE Refactor: refactor select_experts (#7966)
|
2025-07-19 00:51:15 -07:00 |
|
kyleliang-nv
|
cfab0ff6e2
|
Add GB200 wide-EP docker (#8157)
|
2025-07-18 22:34:29 -07:00 |
|
Simo Lin
|
b763cf7e8e
|
[router] allow router to have empty workers (#8160)
|
2025-07-18 22:09:54 -07:00 |
|
Simo Lin
|
8fcc55cfa1
|
[router] router metrics cleanup (#8158)
|
2025-07-18 22:09:17 -07:00 |
|
Yingchun Lai
|
610381b75e
|
[health_generate] fix: fix the /health_generate always success bug (#8028)
|
2025-07-18 22:08:46 -07:00 |
|
Shangming Cai
|
1403ea5694
|
[PD] Support non-MLA models PD different TP with DP attention (#7931)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-07-18 22:00:49 -07:00 |
|
Binyao Jiang
|
b7e951a6db
|
Feat: Support audio in Phi4-mm model (#8048)
|
2025-07-18 21:03:53 -07:00 |
|
Haohui Mai
|
d918ab7985
|
Support NVFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#7302)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
|
2025-07-18 19:59:39 -07:00 |
|
Mick
|
3964b352c3
|
chore: tune mem fraction static for vlm (#6881)
|
2025-07-18 17:19:27 -07:00 |
|