Chanh Nguyen
|
3f1e433903
|
Decoder-only Scoring API (#6460)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
|
2025-06-04 14:14:54 -07:00 |
|
Xinyuan Tong
|
cf9815ba69
|
[Refactor] Multimodal data processing for VLM (#6659)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-06-04 11:22:33 -07:00 |
|
Xiaoyu Zhang
|
bd75690f4e
|
fix ep_moe_reorder kernel bugs (#6858)
Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com>
|
2025-06-04 19:13:59 +08:00 |
|
JieXin Liang
|
180ff5eecc
|
[fix] recover auto-dispatch for rmsnorm and rope (#6745)
|
2025-06-03 21:44:20 -07:00 |
|
Marc Sun
|
37f1547587
|
[FEAT] Add transformers backend support (#5929)
|
2025-06-03 21:05:29 -07:00 |
|
Cheng Wan
|
8a5480528d
|
[Refactor] Rename n_share_experts_fusion as num_fused_shared_experts (#6735)
|
2025-06-03 17:48:24 -07:00 |
|
fzyzcjy
|
b6d0ce9f78
|
Minor add metrics to expert location updater (#6816)
|
2025-06-02 23:59:11 -07:00 |
|
fzyzcjy
|
0ea330ca34
|
Fix wrong weight reference in dynamic EPLB (#6818)
|
2025-06-02 23:26:04 -07:00 |
|
pansicheng
|
27e327b415
|
fix new_page_count_next_decode (#6671)
|
2025-06-02 22:48:52 -07:00 |
|
jianan-gu
|
ff00895c46
|
Add CPU optimized kernels for topk and rope fusions (#6456)
|
2025-06-02 17:37:34 -07:00 |
|
Arthur Cheng
|
ff91474825
|
[Router] Fix k8s Service Discovery (#6766)
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
|
2025-06-02 16:57:23 -07:00 |
|
Pavani Majety
|
eb38c7d1ca
|
[1/2] Add Kernel support for Cutlass based Fused FP4 MoE (#6093)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-06-02 13:48:03 -07:00 |
|
fzyzcjy
|
df7f61ee7d
|
Speed up rebalancing when using non-static dispatch algorithms (#6812)
|
2025-06-02 11:18:17 -07:00 |
|
fzyzcjy
|
ef21729c1d
|
Fix profiles do not have consistent names (#6811)
|
2025-06-02 11:17:22 -07:00 |
|
fzyzcjy
|
f5159315b2
|
Add simple utility to dump tensors for debugging (#6815)
|
2025-06-02 11:15:31 -07:00 |
|
fzyzcjy
|
6d7b6696d4
|
Tiny fix EPLB assertion about rebalancing period and recorder window size (#6813)
|
2025-06-02 11:13:33 -07:00 |
|
fzyzcjy
|
6376b632eb
|
Tiny log prefill time (#6780)
|
2025-06-02 10:28:27 -07:00 |
|
fzyzcjy
|
e05e29d178
|
Refactor CustomOp to avoid confusing bugs (#5382)
|
2025-06-02 10:27:36 -07:00 |
|
Ke Bao
|
a2cb5913a0
|
Add draft extend CUDA graph for flashinfer backend (#6805)
|
2025-06-02 01:51:26 -07:00 |
|
Yuan Luo
|
55444ed667
|
[EP] Add cuda kernel for moe_ep_pre_reorder (#6699)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-06-01 20:49:01 -07:00 |
|
Lianmin Zheng
|
20fd53b8f6
|
Correctly abort the failed grammar requests & Improve the handling of abort (#6803)
|
2025-06-01 19:00:07 -07:00 |
|
Baizhou Zhang
|
6a47b73024
|
Remove contiguous before Flashinfer groupwise fp8 gemm (#6804)
|
2025-06-01 18:30:54 -07:00 |
|
Wenxuan Tan
|
c429919def
|
misc: cache is_hopper_arch (#6799)
|
2025-06-01 15:28:31 -07:00 |
|
Yineng Zhang
|
1da8d23051
|
chore: update blackwell docker (#6800)
|
2025-06-01 13:37:40 -07:00 |
|
Huapeng Zhou
|
2f7420bc84
|
[Feat] Enable PDL automatically on Hopper architecture (#5981)
|
2025-06-01 12:30:17 -07:00 |
|
Ravi Theja
|
c6a0cacc35
|
Update CI tests for Llama4 models (#6421)
|
2025-06-01 11:52:15 +08:00 |
|
Lifu Huang
|
0a9bfc20ab
|
[Minor] Always append newline after image token when parsing chat message (#6797)
|
2025-05-31 20:50:33 -07:00 |
|
Yineng Zhang
|
34c63731fc
|
chore: upgrade sgl-kernel v0.1.5 (#6795)
|
2025-05-31 18:32:00 -07:00 |
|
Lianmin Zheng
|
2d72fc47cf
|
Improve profiler and integrate profiler in bench_one_batch_server (#6787)
|
2025-05-31 15:53:55 -07:00 |
|
Yineng Zhang
|
b520d02888
|
chore: bump sgl-kernel v0.1.5 (#6794)
|
2025-05-31 14:54:00 -07:00 |
|
Qiaolin Yu
|
7dc0e39442
|
Bump torch to 2.7.0 (#6788)
|
2025-05-31 14:43:12 -07:00 |
|
Yikai Zhang
|
fb507b7b10
|
[FIX] mmmu bench serving result display error (#6525) (#6791)
|
2025-05-31 13:48:06 -07:00 |
|
storyicon
|
f90945c45a
|
fix(PD-disaggregation): Can not get local ip (#6792)
Signed-off-by: storyicon <storyicon@foxmail.com>
|
2025-05-31 13:47:14 -07:00 |
|
Lifu Huang
|
094fbdacd5
|
Fix incorrect LoRA weight loading for fused gate_up_proj (#6734)
|
2025-05-31 13:41:44 -07:00 |
|
YanbingJiang
|
888cb175a6
|
Add intel_amx backend for Radix Attention for CPU (#6408)
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-05-30 21:37:42 -07:00 |
|
Chang Su
|
e39bca0756
|
ci: relax test_function_call_required (#6786)
|
2025-05-30 19:18:42 -07:00 |
|
Jianan Ji
|
a2bb856543
|
Temporarily lower mmlu threshold for triton sliding window backend (#6785)
|
2025-05-30 18:40:50 -07:00 |
|
Cheng Wan
|
ced3c07afe
|
Support token-level quantization for EP MoE (#6782)
|
2025-05-30 17:26:30 -07:00 |
|
Chang Su
|
f18b068f15
|
feat(tool call): Enhance Llama32Detector for improved JSON parsing in non-stream (#6784)
|
2025-05-30 17:05:17 -07:00 |
|
Chao Yang
|
4fac524b14
|
update llama4 chat template and pythonic parser (#6679)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-05-30 17:01:22 -07:00 |
|
Cheng Wan
|
b581b22504
|
Fix one bug in the grouped-gemm triton kernel (#6772)
|
2025-05-30 01:42:08 -07:00 |
|
Li Hui
|
69dd878b51
|
Fix shared experts fusion error (#6289)
|
2025-05-30 01:16:11 -07:00 |
|
Jianan Ji
|
22630ca242
|
Support sliding window in triton backend (#6509)
|
2025-05-30 01:11:53 -07:00 |
|
Yuhong Guo
|
d279d4990c
|
Fix aiohttp 'Chunk too big' in bench_serving (#6737)
|
2025-05-30 00:50:36 -07:00 |
|
shangmingc
|
6cb00c6398
|
[PD] Optimize time out logic and add env var doc for mooncake (#6761)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-05-30 00:45:02 -07:00 |
|
Xu Wenqing
|
62cac2c43a
|
Update DeepSeek-R1-0528 function call chat template (#6765)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
|
2025-05-30 00:42:57 -07:00 |
|
fzyzcjy
|
2c3b71d678
|
Improve EPLB logical to physical dispatch map (#6727)
|
2025-05-29 19:23:54 -07:00 |
|
Zilin Zhu
|
51cdd81f97
|
[fix][RL] Fix DeepSeekV3ForCausalLM.post_load_weights for multiple update weight (#6265)
|
2025-05-29 16:28:10 -07:00 |
|
Baizhou Zhang
|
73def253b5
|
Fix mem_fraction_static for AMD CI (#6748)
|
2025-05-29 12:37:30 -07:00 |
|
JieXin Liang
|
d9d35def3d
|
[test] add ut and bm for get_last_loc (#6746)
|
2025-05-29 11:47:21 -07:00 |
|