b8zhong
|
d0a64c7e2c
|
vlm: enforce pybase64 for image and str encode/decode (#10700)
|
2025-10-21 19:05:32 +08:00 |
|
Zhengke Zhou
|
260fe755b6
|
Simplify multi-tokenizer (#11295)
Signed-off-by: zhengkezhou1 <madzhou1@gmail.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-10-21 16:33:29 +08:00 |
|
ybyang
|
dbb16bedd5
|
Support Thinking Budget (via custom_logit_processor for OpenAI API) [Fix #6572] (#11416)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: YorkSu <york_su@qq.com>
|
2025-10-21 16:27:56 +08:00 |
|
Neelabh Sinha
|
852c0578fd
|
[FEATURE] Add OpenAI-Compatible LoRA Adapter Selection (#11570)
|
2025-10-21 15:44:33 +08:00 |
|
Atream
|
7e6191c098
|
init support for KTransformers Heterogeneous Computing (#11487)
Co-authored-by: Jianwei Dong <1913953267@qq.com>
|
2025-10-21 00:17:02 -07:00 |
|
Gaurav Verma
|
6f9b66bdda
|
[AMD] Update wave-lang to 3.8.0 (#11878)
Signed-off-by: xintin <gaurav.verma@amd.com>
|
2025-10-20 23:11:09 -07:00 |
|
Qiaolin Yu
|
d9a20fd28a
|
Use trtllm_mla decode kernel for draft extend in speculative decoding (#11664)
|
2025-10-21 11:42:09 +08:00 |
|
Meng, Hengyu
|
b113c72e7a
|
Init attention backend for Intel XPU (#10656)
Co-authored-by: guangyey <guangye.yu@intel.com>
Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>
|
2025-10-21 11:41:28 +08:00 |
|
zhangdonghao-zdh
|
fb6cc7b000
|
Fix RotaryEmbedding for fp32 input (#11843)
|
2025-10-21 10:56:48 +08:00 |
|
Xiaoyu Zhang
|
8374a96e49
|
piecewise cuda graph support qwen3-moe (#11845)
|
2025-10-21 10:55:49 +08:00 |
|
Yuan Luo
|
74de76c685
|
Revise MRotaryEmbedding's forward (#11859)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: 羽癫 <yudian.zy@antgroup.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
|
2025-10-21 10:38:29 +08:00 |
|
Chang Su
|
9c0b1eb5ad
|
[router][grpc] Fix wram-up random token ids for small models (#11887)
|
2025-10-20 19:22:17 -07:00 |
|
Lianmin Zheng
|
01f14a7ad2
|
[code move] move pp into a separate mixin (#11838)
|
2025-10-20 18:46:56 -07:00 |
|
Lianmin Zheng
|
43ad05907c
|
[Auto Sync] Update scheduler.py, server_args.py (20251020) (#11875)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
|
2025-10-20 17:41:19 -07:00 |
|
fzyzcjy
|
0917c5da8c
|
Support mixing cutedsl and deepgemm backend (#11807)
|
2025-10-21 07:38:35 +08:00 |
|
penguin_wwy
|
184a4df697
|
Replace function call with set literal (#11867)
|
2025-10-21 01:39:16 +08:00 |
|
Qiaolin Yu
|
f7b1d8c5ab
|
Fix acc len and gen throughput metrics when enabling overlap-spec (#11823)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-10-21 01:34:38 +08:00 |
|
Cheng Wan
|
bfc3b3f786
|
[9/N] MoE Refactor: cleanup dispatcher interfaces (#11847)
|
2025-10-20 10:11:46 -07:00 |
|
Liangsheng Yin
|
da5bde4d16
|
Tiny fix main lint (#11862)
|
2025-10-20 19:57:24 +08:00 |
|
DarkSharpness
|
276e7b3e4e
|
[Feature] New structural tag support (#10691)
|
2025-10-20 18:25:58 +08:00 |
|
ishandhanani
|
296f689242
|
fix(server_args): handle tokenizer init conflicts (#11776)
|
2025-10-20 00:27:19 -07:00 |
|
Shane A
|
d383e6616e
|
[Model] Add Olmo 3 model support (#11396)
|
2025-10-19 23:59:16 -07:00 |
|
Shangming Cai
|
a2ba0bc3df
|
Tiny clean up for PD module and doc (#11747)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-20 11:52:42 +08:00 |
|
Ziming Huang
|
6d2d0ce285
|
[PD] Improve eagle acceptance rate by transferring draft model hidden states (#10801)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-20 11:52:18 +08:00 |
|
Yuan Luo
|
271d3d0d50
|
Support mrope triton kernel and add unit test (#11722)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
|
2025-10-20 11:51:07 +08:00 |
|
ykcombat
|
c4e81e64fb
|
[Feature] Use current greenctx stream to communicate in PD-Multiplexing. (#11594)
|
2025-10-20 10:58:20 +08:00 |
|
harrisonlimh
|
c726d44cc7
|
Recapture cuda graph after model weight update to resolve IMA error (#11780)
|
2025-10-20 10:50:03 +08:00 |
|
huangtingwei
|
cae3956585
|
check master server for mooncake store (#10510)
|
2025-10-20 09:37:09 +08:00 |
|
Liu-congo
|
be0058bc05
|
[BugFix] replace the input_to_float8 used in dsv2 (#11612)
Signed-off-by: Liu-congo <1502632128@qq.com>
|
2025-10-19 19:34:13 -05:00 |
|
fzyzcjy
|
a8ba32798e
|
Fix triton_kernels import error on some hardwares (#11831)
|
2025-10-20 08:14:47 +08:00 |
|
Johnny
|
252dc4e112
|
[NVIDIA] FA3/FA4 Fix (#11606)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-19 17:10:10 -07:00 |
|
Baizhou Zhang
|
cbb5fc2edc
|
[CI] Add CI test for DeepSeek V3.2 MTP (#11835)
|
2025-10-19 17:00:25 -07:00 |
|
Stefan He
|
4fff1ec1d9
|
Deterministic Mode: Add 1-stage triton kernel for prefill (#11147)
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Binyao Jiang <bijiang@linkedin.com>
|
2025-10-20 01:47:36 +08:00 |
|
Liangsheng Yin
|
7a020e0f3b
|
[Test] Add basic matched stop for beta eagle (#11833)
|
2025-10-20 01:17:00 +08:00 |
|
Liangsheng Yin
|
48738af7f9
|
[CI] always print back trace in retry() (#11834)
|
2025-10-20 01:12:49 +08:00 |
|
Paiiii
|
efa473348b
|
[Spec Decoding] Support MTP for dsv3.2 (#11652)
Co-authored-by: Paiiiiiiiiiiiiii <zengpai@baidu.com>
|
2025-10-19 23:44:22 +08:00 |
|
Liangsheng Yin
|
d658f0497e
|
[overlap-spec] fix stop condition and trimming (#11819)
|
2025-10-19 22:00:20 +08:00 |
|
Liangsheng Yin
|
57e25de756
|
Revert "Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads" (#11827)
|
2025-10-19 19:44:06 +08:00 |
|
fzyzcjy
|
12eb02e982
|
Change bf16 to fp8 for some gemms in attention for DeepSeek ckpt v2 (#11805)
|
2025-10-19 16:15:13 +08:00 |
|
fzyzcjy
|
002d037359
|
Avoid generation gets hanging when user specifies multiple event loops (#5162)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-10-19 16:12:49 +08:00 |
|
fzyzcjy
|
ce399e154c
|
Make single-batch overlap compatible with NextN (#11804)
|
2025-10-19 16:10:44 +08:00 |
|
fzyzcjy
|
ea6275dfbc
|
Tiny add hints when users send requests to wrong place (#11808)
|
2025-10-19 16:10:20 +08:00 |
|
narutolhy
|
eb7318f1c2
|
support tokenized batch request (#11091)
|
2025-10-19 07:05:02 +00:00 |
|
YAMY
|
80407b0493
|
Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads (#10788)
|
2025-10-19 11:37:43 +08:00 |
|
Liangsheng Yin
|
b288f4f440
|
Improve send_sone script (#11817)
|
2025-10-19 11:28:16 +08:00 |
|
tazjin
|
6d6ea5af0c
|
fix: do not wrap invalid grammar objects during constrained generation (#11328)
|
2025-10-19 10:54:33 +08:00 |
|
Marin
|
1dacedd2db
|
make sure logit bias is applied during eagle spec decoding verification (#11555)
|
2025-10-19 10:53:33 +08:00 |
|
ybyang
|
b5e14b2b78
|
[1/2][feature] support openai like classification api (#11618)
|
2025-10-18 19:32:48 -07:00 |
|
Qiaolin Yu
|
ebda73dc72
|
Use cutlass fp4 gemm by default (#11813)
|
2025-10-18 14:10:15 -07:00 |
|
b8zhong
|
f9a7d9b3dc
|
support server arg override KV cache to bf16 to avoid slow cases (#11749)
|
2025-10-19 02:49:48 +08:00 |
|