Yineng Zhang
|
7eb9d8e594
|
chore: upgrade transformers 4.52.3 (#6575)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-05-25 22:49:58 -07:00 |
|
fzyzcjy
|
6bebef60a7
|
Support accurate length control for bench serving (#6594)
|
2025-05-25 22:46:23 -07:00 |
|
fzyzcjy
|
25be63d0b2
|
Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-25 22:41:27 -07:00 |
|
fzyzcjy
|
93e53f6e0b
|
Logging and minor fixes to two batch overlap and EPLB (#6595)
|
2025-05-25 22:36:40 -07:00 |
|
fzyzcjy
|
a191a0e47c
|
Improve performance of two batch overlap in some imbalanced cases (#6593)
|
2025-05-25 22:36:18 -07:00 |
|
fzyzcjy
|
8c7279c24e
|
Fix profiling will crash the server when using num_steps (#6586)
|
2025-05-25 22:36:02 -07:00 |
|
fzyzcjy
|
0ca1811715
|
Support fake perfectly balanced EP dispatch algorithm (#6571)
|
2025-05-25 22:35:51 -07:00 |
|
fzyzcjy
|
2c3a6fe1de
|
Fix bench_serving does not support changing warmup requests (#6439)
|
2025-05-25 22:35:36 -07:00 |
|
wangxiyu191
|
8b33d8df90
|
[PD] Fix prefill_servers in mini_lb (#6527)
|
2025-05-26 10:38:41 +08:00 |
|
fzyzcjy
|
5ccf8fe1a0
|
Hint users when weight update timeouts (#6570)
|
2025-05-25 09:13:17 -07:00 |
|
Shenggui Li
|
3f23d8cdf1
|
added support for tied weights in qwen pipeline parallelism (#6546)
|
2025-05-25 00:00:56 -07:00 |
|
Lifu Huang
|
022012aae8
|
Support Phi-4 Multi-Modal (text + vision only) (#6494)
|
2025-05-24 21:43:38 -07:00 |
|
Chang Su
|
681e7af32b
|
[OAI] Support non-normalized logprobs in OpenAI server (#5961)
|
2025-05-24 21:35:55 -07:00 |
|
Xinyuan Tong
|
681fdc264b
|
Refactor vlm embedding routine to use precomputed feature (#6543)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-24 18:39:21 -07:00 |
|
fzyzcjy
|
0d47788025
|
Support overlapping two batches (#4068)
|
2025-05-24 17:39:07 -07:00 |
|
fzyzcjy
|
f456037396
|
Utilize static dispatching for communicator (#6577)
|
2025-05-24 17:34:35 -07:00 |
|
fzyzcjy
|
b2388433be
|
Add back DeepSeek non-TBO branches (#6578)
|
2025-05-24 17:34:00 -07:00 |
|
fzyzcjy
|
a38376fa99
|
Refactor attention into multiple stages (#6477)
|
2025-05-24 17:33:25 -07:00 |
|
kk
|
7a5e6ce1cb
|
Fix GPU OOM (#6564)
Co-authored-by: michael <michael.zhang@amd.com>
|
2025-05-24 16:38:39 -07:00 |
|
Yineng Zhang
|
7e257cd666
|
chore: bump v0.4.6.post5 (#6566)
|
2025-05-24 00:48:05 -07:00 |
|
fzyzcjy
|
c4831e2fcf
|
Fix accuracy is zero when enabling moe-dense-tp-size as in large scale EP (#6567)
|
2025-05-24 00:27:10 -07:00 |
|
Neo
|
2e37fa07ba
|
[FIX]remove ServerArgs duplicate code (#6485)
|
2025-05-23 22:54:41 -07:00 |
|
Byron Hsu
|
2d831c6ef9
|
[PD] Support structured output (#6560)
|
2025-05-23 21:49:00 -07:00 |
|
Chang Su
|
ed0c3035cd
|
feat(Tool Calling): Support required and specific function mode (#6550)
|
2025-05-23 21:00:37 -07:00 |
|
Yi Zhang
|
e6f113569e
|
support eplb for qwen3 (#6533)
|
2025-05-23 18:31:30 -07:00 |
|
Chang Su
|
7b02c32679
|
[Bugfix](gemma3_mm): handle flatten_batch constraint for multiple images (#6562)
|
2025-05-23 18:11:54 -07:00 |
|
miter
|
fefa19fec0
|
Update cmdline --enable-dp-attention help string for Qwen 2/3 Moe models. (#6524)
Signed-off-by: miter <miterv@outlook.com>
|
2025-05-23 15:20:21 -07:00 |
|
Byron Hsu
|
8233cc10fd
|
[PD] Support logprob & Add failure test (#6558)
|
2025-05-23 14:29:20 -07:00 |
|
HandH1998
|
1b2e8f76d9
|
[2/2] Support Qserve (#6521)
|
2025-05-23 12:39:18 -07:00 |
|
Byron Hsu
|
d2e0881a34
|
[PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-23 12:03:05 -07:00 |
|
Li Hui
|
2f42749184
|
Fix topk inference performance reduce (#6474)
|
2025-05-23 02:58:31 -07:00 |
|
Chang Su
|
4685fbb888
|
[VLM] Support chunk prefill for VLM (#6355)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-05-22 20:32:41 -07:00 |
|
Byron Hsu
|
0a4fc73b48
|
[PD] Fix failure abort (#6535)
|
2025-05-22 20:32:03 -07:00 |
|
Yineng Zhang
|
a6970a17f3
|
misc: fix accept_length (#6536)
|
2025-05-22 14:27:10 -07:00 |
|
ryang
|
a6ae3af15e
|
Support XiaomiMiMo inference with mtp (#6059)
|
2025-05-22 14:14:49 -07:00 |
|
Yineng Zhang
|
0b07c4a99f
|
chore: upgrade sgl-kernel v0.1.4 (#6532)
|
2025-05-22 13:28:16 -07:00 |
|
lukec
|
fc0e3b9174
|
Support qwen3 deepep (#6120)
|
2025-05-22 11:04:45 -07:00 |
|
shangmingc
|
58f10679e1
|
Fix missing http status import for PD failure handler (#6520)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-22 15:23:54 +08:00 |
|
fzyzcjy
|
7a80f56513
|
Support dynamically rebalancing experts using EPLB (#6469)
|
2025-05-21 23:13:21 -07:00 |
|
fzyzcjy
|
9484eba4ad
|
Support logging expert balancedness metrics (#6482)
|
2025-05-21 23:05:33 -07:00 |
|
Zilin Zhu
|
e9feb48838
|
[RL] Remove the w13 weight_scale and input_scale for UnquantizedEPMoE… (#6308)
|
2025-05-21 22:03:15 -07:00 |
|
fzyzcjy
|
fc992a09f9
|
Support updating expert locations dynamically (#6388)
|
2025-05-21 21:59:33 -07:00 |
|
Byron Hsu
|
3bde101099
|
[PD] Abort request if transfer fails (#6504)
|
2025-05-21 21:44:25 -07:00 |
|
Byron Hsu
|
7513558074
|
[PD] Add doc and simplify sender.send (#6019)
|
2025-05-21 21:22:21 -07:00 |
|
Ke Bao
|
6ce0ed073b
|
Apply constraint grammar to EAGLE (#6499)
Co-authored-by: merrymercy <lianminzheng@gmail.com>
|
2025-05-21 17:18:41 -07:00 |
|
fzyzcjy
|
969660c762
|
Recover from corrupted cache file in bench serving (#6510)
|
2025-05-21 17:13:54 -07:00 |
|
Kyungmin Lee
|
ada268fd05
|
fix: EXAONE when using tie_word_embeddings (#5759)
|
2025-05-21 11:30:04 -07:00 |
|
Baizhou Zhang
|
d4c038daed
|
[Fix]Fix capture fail bug for DeepSeek (#6275)
|
2025-05-21 11:11:20 -07:00 |
|
fzyzcjy
|
55f6005f53
|
Fix bench_one_batch_server (#6503)
|
2025-05-21 11:08:17 -07:00 |
|
fzyzcjy
|
7222e1dacc
|
Let bench_one_batch_server use sharegpt data to make expert distribution more natural (#5573)
|
2025-05-21 02:08:43 -07:00 |
|