Ravi Theja
|
c6a0cacc35
|
Update CI tests for Llama4 models (#6421)
|
2025-06-01 11:52:15 +08:00 |
|
Lianmin Zheng
|
2d72fc47cf
|
Improve profiler and integrate profiler in bench_one_batch_server (#6787)
|
2025-05-31 15:53:55 -07:00 |
|
Chang Su
|
e39bca0756
|
ci: relax test_function_call_required (#6786)
|
2025-05-30 19:18:42 -07:00 |
|
Jianan Ji
|
a2bb856543
|
Temporarily lower mmlu threshold for triton sliding window backend (#6785)
|
2025-05-30 18:40:50 -07:00 |
|
Chang Su
|
f18b068f15
|
feat(tool call): Enhance Llama32Detector for improved JSON parsing in non-stream (#6784)
|
2025-05-30 17:05:17 -07:00 |
|
Chao Yang
|
4fac524b14
|
update llama4 chat template and pythonic parser (#6679)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-05-30 17:01:22 -07:00 |
|
Jianan Ji
|
22630ca242
|
Support sliding window in triton backend (#6509)
|
2025-05-30 01:11:53 -07:00 |
|
Chang Su
|
c673727e0e
|
refactor(tool call): Fix BaseFormatDetector tool_index issue and refactor parse_streaming_increment (#6715)
|
2025-05-29 00:08:45 -07:00 |
|
iLeGend
|
e06b076105
|
Fix PP for Qwen3 MoE (#6709)
|
2025-05-28 23:06:18 -07:00 |
|
shangmingc
|
d63e76f735
|
[CI] Fix setup of disaggregation with different tp (#6706)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-28 11:17:27 -07:00 |
|
shangmingc
|
c25231c679
|
[CI] Fix flaky pp single node test (#6689)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-28 00:40:26 -07:00 |
|
Sai Enduri
|
f4a8987f69
|
Update amd docker and nightly models. (#6687)
|
2025-05-28 00:08:08 -07:00 |
|
Chang Su
|
41ba767f0c
|
feat: Add warnings for invalid tool_choice and UTs (#6582)
|
2025-05-27 16:53:19 -07:00 |
|
Ke Bao
|
f127355a30
|
Add batch test for draft extend (#6672)
|
2025-05-27 16:32:05 -07:00 |
|
fzyzcjy
|
87068b5cc7
|
Support gathering expert distribution details (#6665)
|
2025-05-27 15:32:59 -07:00 |
|
Junrong Lin
|
2103b80607
|
[CI] update verlengine ci to 4-gpu test (#6007)
|
2025-05-27 14:32:23 -07:00 |
|
Sai Enduri
|
eb8f02dd87
|
Update nightly thresholds and dependencies. (#6635)
|
2025-05-26 11:44:13 -07:00 |
|
Lifu Huang
|
0d503090aa
|
Supported precomputed feature for Kimi VL (#6599)
|
2025-05-26 01:24:13 -07:00 |
|
Yi Zhang
|
f9bab3d591
|
qwen3moe support two batch overlap (#6598)
|
2025-05-25 23:08:16 -07:00 |
|
Chang Su
|
16f69b1f65
|
feat: Improve Mistral and Qwen25 function call parsing (#6597)
|
2025-05-25 23:07:23 -07:00 |
|
Yi Zhang
|
65f091310c
|
refactor qwen moe code, use communicator to support tp+dp (#6581)
|
2025-05-25 23:01:10 -07:00 |
|
Yineng Zhang
|
7eb9d8e594
|
chore: upgrade transformers 4.52.3 (#6575)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-05-25 22:49:58 -07:00 |
|
fzyzcjy
|
a191a0e47c
|
Improve performance of two batch overlap in some imbalanced cases (#6593)
|
2025-05-25 22:36:18 -07:00 |
|
Shenggui Li
|
3f23d8cdf1
|
added support for tied weights in qwen pipeline parallelism (#6546)
|
2025-05-25 00:00:56 -07:00 |
|
Lifu Huang
|
022012aae8
|
Support Phi-4 Multi-Modal (text + vision only) (#6494)
|
2025-05-24 21:43:38 -07:00 |
|
Xinyuan Tong
|
681fdc264b
|
Refactor vlm embedding routine to use precomputed feature (#6543)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-24 18:39:21 -07:00 |
|
fzyzcjy
|
0d47788025
|
Support overlapping two batches (#4068)
|
2025-05-24 17:39:07 -07:00 |
|
kk
|
7a5e6ce1cb
|
Fix GPU OOM (#6564)
Co-authored-by: michael <michael.zhang@amd.com>
|
2025-05-24 16:38:39 -07:00 |
|
Byron Hsu
|
2d831c6ef9
|
[PD] Support structured output (#6560)
|
2025-05-23 21:49:00 -07:00 |
|
Chang Su
|
ed0c3035cd
|
feat(Tool Calling): Support required and specific function mode (#6550)
|
2025-05-23 21:00:37 -07:00 |
|
Shi Shuai
|
9c574585b3
|
fix: remove content=none test when tool called (#6347)
|
2025-05-23 15:12:55 -07:00 |
|
Byron Hsu
|
8233cc10fd
|
[PD] Support logprob & Add failure test (#6558)
|
2025-05-23 14:29:20 -07:00 |
|
Byron Hsu
|
d2e0881a34
|
[PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-23 12:03:05 -07:00 |
|
YanbingJiang
|
d8189660a9
|
Update sgl-kernel UTs for activation/topk/norm/rope kernels (#6452)
|
2025-05-23 02:03:15 -07:00 |
|
Chunyuan WU
|
3ded6235c9
|
Add fp8 fused_experts kernel for CPU in sgl-kernel and add UT (#6404)
|
2025-05-23 02:01:55 -07:00 |
|
blzheng
|
4ba1eea83f
|
Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT (#6493)
|
2025-05-23 00:14:46 -07:00 |
|
Chang Su
|
4685fbb888
|
[VLM] Support chunk prefill for VLM (#6355)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-22 20:32:41 -07:00 |
|
ryang
|
a6ae3af15e
|
Support XiaomiMiMo inference with mtp (#6059)
|
2025-05-22 14:14:49 -07:00 |
|
Yineng Zhang
|
0b07c4a99f
|
chore: upgrade sgl-kernel v0.1.4 (#6532)
|
2025-05-22 13:28:16 -07:00 |
|
fzyzcjy
|
7a80f56513
|
Support dynamically rebalancing experts using EPLB (#6469)
|
2025-05-21 23:13:21 -07:00 |
|
fzyzcjy
|
fc992a09f9
|
Support updating expert locations dynamically (#6388)
|
2025-05-21 21:59:33 -07:00 |
|
Ke Bao
|
6ce0ed073b
|
Apply constraint grammar to EAGLE (#6499)
Co-authored-by: merrymercy <lianminzheng@gmail.com>
|
2025-05-21 17:18:41 -07:00 |
|
blzheng
|
cfe48c5902
|
[CPU] Fix build issue (#6419)
|
2025-05-21 11:17:10 -07:00 |
|
Jiajun Li
|
4024e1d2a8
|
Implement Siglip Vision model, and support BNB quantization for gemma3-mm (#5339)
|
2025-05-20 23:53:46 -07:00 |
|
HAI
|
5c0b38f369
|
aiter attention-backend (default enabled on AMD/ROCm) (#6381)
|
2025-05-20 22:52:41 -07:00 |
|
YanbingJiang
|
32cc66efa5
|
Update extend/decode attention kernel for CPU in sgl-kernel and add UTs (#6405)
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-05-19 21:23:17 -07:00 |
|
fzyzcjy
|
f0653886a5
|
Expert distribution recording without overhead for EPLB (#4957)
|
2025-05-19 20:07:43 -07:00 |
|
Yineng Zhang
|
b146555749
|
Revert "Implement return_hidden_states for the OpenAI API (#6137)" (#6440)
|
2025-05-19 18:21:29 -07:00 |
|
Trevor Morris
|
7adf245ba2
|
[Metrics] Add KV events publishing (#6098)
|
2025-05-19 14:19:54 -07:00 |
|
Baizhou Zhang
|
299fd22f9e
|
Fix throughput threshold for amd ci test (#6414)
|
2025-05-19 14:17:41 -07:00 |
|