Commit Graph

889 Commits

Author SHA1 Message Date
Xinyuan Tong
cf9815ba69 [Refactor] Multimodal data processing for VLM (#6659)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-06-04 11:22:33 -07:00
Cheng Wan
8a5480528d [Refactor] Rename n_share_experts_fusion as num_fused_shared_experts (#6735) 2025-06-03 17:48:24 -07:00
pansicheng
27e327b415 fix new_page_count_next_decode (#6671) 2025-06-02 22:48:52 -07:00
fzyzcjy
df7f61ee7d Speed up rebalancing when using non-static dispatch algorithms (#6812) 2025-06-02 11:18:17 -07:00
fzyzcjy
ef21729c1d Fix profiles do not have consistent names (#6811) 2025-06-02 11:17:22 -07:00
fzyzcjy
6d7b6696d4 Tiny fix EPLB assertion about rebalancing period and recorder window size (#6813) 2025-06-02 11:13:33 -07:00
fzyzcjy
6376b632eb Tiny log prefill time (#6780) 2025-06-02 10:28:27 -07:00
Lianmin Zheng
20fd53b8f6 Correctly abort the failed grammar requests & Improve the handling of abort (#6803) 2025-06-01 19:00:07 -07:00
Lianmin Zheng
2d72fc47cf Improve profiler and integrate profiler in bench_one_batch_server (#6787) 2025-05-31 15:53:55 -07:00
YanbingJiang
888cb175a6 Add intel_amx backend for Radix Attention for CPU (#6408)
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-05-30 21:37:42 -07:00
fzyzcjy
2c3b71d678 Improve EPLB logical to physical dispatch map (#6727) 2025-05-29 19:23:54 -07:00
fzyzcjy
3ab7d9b55e Support picking variants of EPLB algorithms (#6728) 2025-05-29 08:12:01 -07:00
fzyzcjy
7e5071c92a Super tiny enable sole usage of expert distribution metrics and update doc (#6680) 2025-05-29 08:11:38 -07:00
Liangsheng Yin
78689d3393 PD Rust LB (PO2) (#6437)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-05-29 20:50:10 +08:00
JieXin Liang
2163586e63 [feat] triton kernel for get_last_loc (#6676) 2025-05-28 23:10:28 -07:00
fzyzcjy
87068b5cc7 Support gathering expert distribution details (#6665) 2025-05-27 15:32:59 -07:00
Lifu Huang
79a39ac0cc follow-up: move Idefics2 to a shared location to eliminate unexpected dependency. (#6603) 2025-05-26 19:23:59 -07:00
fzyzcjy
5c7aa00976 Fix EPLB algorithm fail to run when using 3 nodes for prefill (#6629) 2025-05-26 08:43:24 -07:00
Yi Zhang
14d1075f2c fix qwen3moe eplb prefill bug (#6617) 2025-05-26 02:15:21 -07:00
Lifu Huang
0d503090aa Supported precomputed feature for Kimi VL (#6599) 2025-05-26 01:24:13 -07:00
fzyzcjy
93e53f6e0b Logging and minor fixes to two batch overlap and EPLB (#6595) 2025-05-25 22:36:40 -07:00
fzyzcjy
8c7279c24e Fix profiling will crash the server when using num_steps (#6586) 2025-05-25 22:36:02 -07:00
fzyzcjy
0ca1811715 Support fake perfectly balanced EP dispatch algorithm (#6571) 2025-05-25 22:35:51 -07:00
Lifu Huang
022012aae8 Support Phi-4 Multi-Modal (text + vision only) (#6494) 2025-05-24 21:43:38 -07:00
Xinyuan Tong
681fdc264b Refactor vlm embedding routine to use precomputed feature (#6543)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-24 18:39:21 -07:00
fzyzcjy
0d47788025 Support overlapping two batches (#4068) 2025-05-24 17:39:07 -07:00
Byron Hsu
2d831c6ef9 [PD] Support structured output (#6560) 2025-05-23 21:49:00 -07:00
Yi Zhang
e6f113569e support eplb for qwen3 (#6533) 2025-05-23 18:31:30 -07:00
Byron Hsu
8233cc10fd [PD] Support logprob & Add failure test (#6558) 2025-05-23 14:29:20 -07:00
Byron Hsu
d2e0881a34 [PD] support spec decode (#6507)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-23 12:03:05 -07:00
Chang Su
4685fbb888 [VLM] Support chunk prefill for VLM (#6355)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-22 20:32:41 -07:00
Byron Hsu
0a4fc73b48 [PD] Fix failure abort (#6535) 2025-05-22 20:32:03 -07:00
fzyzcjy
7a80f56513 Support dynamically rebalancing experts using EPLB (#6469) 2025-05-21 23:13:21 -07:00
fzyzcjy
9484eba4ad Support logging expert balancedness metrics (#6482) 2025-05-21 23:05:33 -07:00
fzyzcjy
fc992a09f9 Support updating expert locations dynamically (#6388) 2025-05-21 21:59:33 -07:00
Byron Hsu
3bde101099 [PD] Abort request if transfer fails (#6504) 2025-05-21 21:44:25 -07:00
Byron Hsu
7513558074 [PD] Add doc and simplify sender.send (#6019) 2025-05-21 21:22:21 -07:00
fzyzcjy
ccfe5c009d Support redundant experts in expert parallel (#6461) 2025-05-21 02:05:53 -07:00
Zilin Zhu
7c347259ff [RL] allow weight updation with dp attention enabled (#6311) 2025-05-21 01:58:55 -07:00
fzyzcjy
e98afbe042 Support dispatching logical to physical experts (#6385) 2025-05-19 22:13:55 -07:00
fzyzcjy
cba1cdbc46 Support DeepSeek EPLB algorithm with static distributions (#6387) 2025-05-19 21:06:21 -07:00
fzyzcjy
c471d39eb9 Support loading weights when physical experts are different from logical experts (#6386) 2025-05-19 21:05:53 -07:00
fzyzcjy
f0653886a5 Expert distribution recording without overhead for EPLB (#4957) 2025-05-19 20:07:43 -07:00
Yi Zhang
b06215daed [BUG] fix stop_profile crash (#6431) 2025-05-19 17:30:33 -07:00
Trevor Morris
7adf245ba2 [Metrics] Add KV events publishing (#6098) 2025-05-19 14:19:54 -07:00
Mick
626ccb7d3f vlm: tensor hash kernel (#5974) 2025-05-18 15:38:16 -07:00
Mick
01dd39bac1 refactor: minor refactors regarding multimodal processing (#6187) 2025-05-17 22:53:20 -07:00
fzyzcjy
4086566516 Fix expert distribution recorder and profiler command stuck forever (#6284) 2025-05-17 17:10:44 -07:00
fzyzcjy
fd08c04821 Support custom DeepEP tuning config (#6257) 2025-05-17 17:09:42 -07:00
fzyzcjy
01d2838c0f Fix stop_profile does not wait for finishing (#4741) 2025-05-17 17:06:15 -07:00