Yineng Zhang
|
4953f4ca9a
|
chore: upgrade sgl-kernel 0.2.7 (#8304)
|
2025-07-23 15:07:27 -07:00 |
|
Xinyuan Tong
|
38000a5f44
|
Fix gemma3n with hybrid swa (#8240)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-07-23 13:29:18 -07:00 |
|
Xinyuan Tong
|
70251e935e
|
fix: match chat-template for internvl3 (#8262)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-07-23 13:29:03 -07:00 |
|
xianzhiT
|
c87d4fec99
|
Fix the issue of incorrect finish reason in final stream response chunk returned during tool call (#7708)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-07-23 13:28:53 -07:00 |
|
YiXR
|
a99801e075
|
[Performance][PD Disaggregation] optimize TokenToKVPoolAllocator by sorting free pages (#8133)
Signed-off-by: Xingrui Yi <yixingrui@linux.alibaba.com>
Co-authored-by: Xingrui Yi <yixingrui@linux.alibaba.com>
|
2025-07-23 13:28:12 -07:00 |
|
Zhiqiang Xie
|
f39037fffb
|
HiCache Fix (#8288)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
|
2025-07-23 16:51:32 +08:00 |
|
Lifu Huang
|
8abd3e77fe
|
Introduce Stable LoRA ID System for Overlapped Updates and Prefix Caching (#8261)
|
2025-07-23 00:32:16 -07:00 |
|
Ke Bao
|
e2d66f60c8
|
Skip llama4 vision module loading when multimodal disabled (#8272)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-07-23 12:41:25 +08:00 |
|
Yineng Zhang
|
01c000043c
|
chore: bump v0.4.9.post3 (#8265)
|
2025-07-22 15:55:48 -07:00 |
|
yhyang201
|
0dfe2491ac
|
Preliminary Support for Qwen3XMLDetector (#8260)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
2025-07-23 06:49:38 +08:00 |
|
Hubert Lu
|
e50109f2ed
|
[AMD] Remove vllm's scaled_fp8_quant and moe_sum when SGLANG_USE_AITER=1 (#7484)
|
2025-07-21 17:33:19 -07:00 |
|
Xinyuan Tong
|
69adc4f81c
|
fix: retrieve mm token by modality, raise error if none (#8221)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-07-22 08:06:35 +08:00 |
|
Yineng Zhang
|
74f59ae555
|
chore: upgrade sgl-kernel 0.2.6.post1 (#8202)
|
2025-07-21 02:10:24 -07:00 |
|
Ke Bao
|
6936be3221
|
Remve router gemm output dtype conversion (#8204)
|
2025-07-21 15:37:00 +08:00 |
|
Xinyuan Tong
|
8430bfe3e9
|
[Refactor] simplify multimodal data processing (#8107)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-07-20 21:43:09 -07:00 |
|
Ke Bao
|
c9e8613c97
|
Apply fused sorted token ids padding (#8193)
|
2025-07-21 11:19:48 +08:00 |
|
JieXin Liang
|
7eebd44047
|
[fix] fix modelopt fp4 on b200 (#8195)
|
2025-07-20 17:39:57 -07:00 |
|
Ke Bao
|
465968b2e3
|
Fix dtype error in CI (#8197)
|
2025-07-21 00:27:55 +08:00 |
|
GuoYipin
|
750838adc4
|
fix: fix the bug of loading Internvl3 (#8067)
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-07-20 22:22:54 +08:00 |
|
Jay Zhou
|
99aefa037e
|
Fix eagle3 cuda graph (#8163)
|
2025-07-20 15:28:06 +08:00 |
|
Qiaolin Yu
|
bbcfbc1a02
|
feat: add h200 tp 16 kimi k2 moe config (#8183)
|
2025-07-19 23:30:08 -07:00 |
|
Praneth Paruchuri
|
83c104b188
|
Feat: Support for Persimmon Model (#7983)
|
2025-07-19 23:07:47 -07:00 |
|
Lianmin Zheng
|
55381a46ac
|
Revert "[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability" (#8181)
|
2025-07-19 22:41:30 -07:00 |
|
Atream
|
a589a07167
|
fix moe gate dtype, fix tbo, fix fake dispatch (#7825)
|
2025-07-19 22:13:46 -07:00 |
|
Yineng Zhang
|
f62d75b6a1
|
feat: add b200 tp 16 kimi k2 moe config (#8178)
|
2025-07-19 20:04:12 -07:00 |
|
Yineng Zhang
|
0f9b11e310
|
feat: add h200 tp 16 kimi k2 moe config (#8176)
|
2025-07-19 20:04:02 -07:00 |
|
Pavel Logachev
|
877e35d775
|
Add get_hidden_dim to qwen3.py for correct lora (#7312)
|
2025-07-19 19:31:16 -07:00 |
|
Clay
|
cbdfb77123
|
Enable FlashInfer support encoder models and add head_dim padding workaround (#6230)
|
2025-07-19 19:30:16 -07:00 |
|
ybyang
|
4540a4666a
|
[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability (#8115)
Signed-off-by: ybyang <ybyang7@iflytek.com>
|
2025-07-19 18:10:00 -07:00 |
|
Baizhou Zhang
|
8cddfa56a1
|
Clean warning logs for gate_proj loading in Lora (#8172)
|
2025-07-19 15:56:50 -07:00 |
|
Lifu Huang
|
4e3defe5a7
|
Support start up LoRA server without initial adapters (#8019)
|
2025-07-19 15:38:09 -07:00 |
|
Garry Fang
|
60468da4e2
|
bugfix: fix sglang crash in NVIDIA MIG container (#8167)
Signed-off-by: Garrybest <garrybest@foxmail.com>
|
2025-07-19 14:41:27 -07:00 |
|
Lifu Huang
|
3de617a75b
|
Fix LoRA buffer contamination during adapter eviction (#8103)
|
2025-07-19 13:14:08 -07:00 |
|
Lianmin Zheng
|
bb0e8a32b5
|
Clean up server args (#8161)
|
2025-07-19 11:32:52 -07:00 |
|
Yineng Zhang
|
561dd7b2ce
|
chore: upgrade sgl-kernel 0.2.6 (#8166)
|
2025-07-19 03:17:08 -07:00 |
|
Cheng Wan
|
15ad6c9086
|
[1/N] MoE Refactor: refactor select_experts (#7966)
|
2025-07-19 00:51:15 -07:00 |
|
Yingchun Lai
|
610381b75e
|
[health_generate] fix: fix the /health_generate always success bug (#8028)
|
2025-07-18 22:08:46 -07:00 |
|
Shangming Cai
|
1403ea5694
|
[PD] Support non-MLA models PD different TP with DP attention (#7931)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-07-18 22:00:49 -07:00 |
|
Binyao Jiang
|
b7e951a6db
|
Feat: Support audio in Phi4-mm model (#8048)
|
2025-07-18 21:03:53 -07:00 |
|
Haohui Mai
|
d918ab7985
|
Support NVFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#7302)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
|
2025-07-18 19:59:39 -07:00 |
|
Mick
|
3964b352c3
|
chore: tune mem fraction static for vlm (#6881)
|
2025-07-18 17:19:27 -07:00 |
|
Lianmin Zheng
|
9c7a46180c
|
[Doc] Steps to add a new attention backend (#8155)
|
2025-07-18 16:38:26 -07:00 |
|
Hubert Lu
|
7750b91ca8
|
[AMD] Add triton awq_dequantize kernel to support AWQ on ROCm (#7661)
|
2025-07-18 14:27:25 -07:00 |
|
Hongbo Xu
|
1f76fc8747
|
[3/n] chore: decouple AWQ implementation from vLLM dependency (#8113)
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
|
2025-07-18 11:45:22 -07:00 |
|
Even Zhou
|
6737671c82
|
[Bugfix] Fix w8a8_int8 import error on NPU (#8147)
|
2025-07-18 11:34:55 -07:00 |
|
Enrique Shockwave
|
fd63b62eaa
|
fix compressed tensors WNA16 imports (#8142)
|
2025-07-18 11:34:14 -07:00 |
|
Sai Enduri
|
d0510f08fe
|
Revert "Fix different device type adjustment in PP" (#8141)
|
2025-07-18 01:12:11 -07:00 |
|
Zhiqiang Xie
|
9d33fcfb8e
|
Hicache Storage Layer Prototype (#7704)
|
2025-07-18 15:20:19 +08:00 |
|
jianan-gu
|
7891bac16b
|
[Quantization][w8a8_int8] Fix weight loading issue for w8a8_int8 path with "ignore" layer list in quantization config (#7820)
|
2025-07-17 22:03:56 -07:00 |
|
jianan-gu
|
48c1fa7bb6
|
[CPU][Llama4] Fix Llama4 MoE inputs with "apply_router_weight_on_input" (#7889)
|
2025-07-17 21:43:25 -07:00 |
|