Commit Graph

6127 Commits

Author SHA1 Message Date
Fan Yin
23afdfd1c2 [sgl-kernel] support flashmla libtorch (#11717) 2025-10-21 21:17:50 -07:00
Liangsheng Yin
9d61205dac [lint] improve ruff check (#11922)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-10-22 11:32:50 +08:00
Chang Su
590bc4b7a7 [router][grpc] Fix background tasks stored with wrong id (#11945) 2025-10-21 18:38:51 -07:00
Keyang Ru
63cfe1b032 [router] Add gRPC E2E test suite (#11790) 2025-10-21 17:51:21 -07:00
Chang Su
70f6309cd4 [router][grpc] Support v1/responses API (#11926) 2025-10-21 17:41:48 -07:00
Yineng Zhang
704160017d fix: resolve flashinfer 0.4.1 import (#11940) 2025-10-21 17:19:57 -07:00
Keyang Ru
87a92e459a Fix openai input_text type compatibility (#11935) 2025-10-21 16:10:35 -07:00
Yineng Zhang
c461e7714d [Auto Sync] Update forward_batch_info.py (20251021) (#11934)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: yinghui <32845984+cicirori@users.noreply.github.com>
2025-10-21 15:52:15 -07:00
Zheng Wengang
fde2decf8b [BugFix][Qwen3-VL]: add metadata for video in qwen3-vl (#11377) 2025-10-21 15:36:01 -07:00
Yineng Zhang
9792b9d7e3 chore: upgrade flashinfer 0.4.1 (#11933) 2025-10-21 14:46:31 -07:00
Baizhou Zhang
ef4a8097b8 Rename flashmla kernel options of nsa backend for better readability (#11876) 2025-10-21 13:14:16 -07:00
Baizhou Zhang
ebff4ee648 Update sgl-kernel and remove fast hadamard depedency (#11844) 2025-10-21 13:13:54 -07:00
Serge Panev
2b1da821b5 [NVIDIA] Add new SMs support for Spark & Thor (#11287)
Signed-off-by: Serge Panev <spanev@nvidia.com>
2025-10-22 02:02:24 +08:00
Liangsheng Yin
97710ccd1a Fix flush cache API for spec v2 (#11918) 2025-10-21 23:01:16 +08:00
Shangming Cai
f3cd5d2510 [CI] Fix b200 flashinfer installation (#11915)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-21 22:28:50 +08:00
Kai-Hsun Chen
c61b0b294c [quantization][MoE] fix the check for tp_size / moe_ep_size / moe_intermediate_size / weight_block_size_n (#11702)
Signed-off-by: Kai-Hsun Chen <khchen@x.ai>
2025-10-21 21:25:28 +08:00
Vincent Zhong
e8640ee9be [smol] [perf] Inverse perm improvement (#11482)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
2025-10-21 19:18:10 +08:00
b8zhong
d0a64c7e2c vlm: enforce pybase64 for image and str encode/decode (#10700) 2025-10-21 19:05:32 +08:00
Shangming Cai
05d3667ab9 [CI] disable glm4.1v and fix the flashinfer installation (#11902)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-21 18:38:35 +08:00
Zhengke Zhou
260fe755b6 Simplify multi-tokenizer (#11295)
Signed-off-by: zhengkezhou1 <madzhou1@gmail.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-10-21 16:33:29 +08:00
ybyang
dbb16bedd5 Support Thinking Budget (via custom_logit_processor for OpenAI API) [Fix #6572] (#11416)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: YorkSu <york_su@qq.com>
2025-10-21 16:27:56 +08:00
Hank Han
c1e1600373 [fix] fix ci uv install dependency (#11895) 2025-10-21 16:23:34 +08:00
Neelabh Sinha
852c0578fd [FEATURE] Add OpenAI-Compatible LoRA Adapter Selection (#11570) 2025-10-21 15:44:33 +08:00
Atream
7e6191c098 init support for KTransformers Heterogeneous Computing (#11487)
Co-authored-by: Jianwei Dong <1913953267@qq.com>
2025-10-21 00:17:02 -07:00
Gaurav Verma
6f9b66bdda [AMD] Update wave-lang to 3.8.0 (#11878)
Signed-off-by: xintin <gaurav.verma@amd.com>
2025-10-20 23:11:09 -07:00
Simo Lin
8a801ee38d [router] release router 0.2.1 (#11885) 2025-10-20 21:08:45 -07:00
Qiaolin Yu
d9a20fd28a Use trtllm_mla decode kernel for draft extend in speculative decoding (#11664) 2025-10-21 11:42:09 +08:00
Meng, Hengyu
b113c72e7a Init attention backend for Intel XPU (#10656)
Co-authored-by: guangyey <guangye.yu@intel.com>
Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>
2025-10-21 11:41:28 +08:00
zhangdonghao-zdh
fb6cc7b000 Fix RotaryEmbedding for fp32 input (#11843) 2025-10-21 10:56:48 +08:00
Xiaoyu Zhang
8374a96e49 piecewise cuda graph support qwen3-moe (#11845) 2025-10-21 10:55:49 +08:00
Yuan Luo
74de76c685 Revise MRotaryEmbedding's forward (#11859)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: 羽癫 <yudian.zy@antgroup.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
2025-10-21 10:38:29 +08:00
Chang Su
9c0b1eb5ad [router][grpc] Fix wram-up random token ids for small models (#11887) 2025-10-20 19:22:17 -07:00
Lianmin Zheng
01f14a7ad2 [code move] move pp into a separate mixin (#11838) 2025-10-20 18:46:56 -07:00
Simo Lin
1111030395 [router] clean up workflow logs to debug for implementation details logs (#11886) 2025-10-20 18:24:55 -07:00
Tien Nguyen
28ddfb37d7 fix(sql-router): fix conflict port in test (#11826)
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
2025-10-20 18:06:34 -07:00
Chang Su
e69094df64 [router][grpc] Remove continue_final_message in ChatTemplateParams and add minijinja-contrib (#11882) 2025-10-20 18:03:09 -07:00
Lianmin Zheng
43ad05907c [Auto Sync] Update scheduler.py, server_args.py (20251020) (#11875)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
2025-10-20 17:41:19 -07:00
Simo Lin
b4948512b8 [router] remove encoding header for oai router (#11881) 2025-10-20 17:39:00 -07:00
Simo Lin
ddcba74b4d [router] Worker Management Workflow Engine (#11868) 2025-10-20 17:00:22 -07:00
fzyzcjy
0917c5da8c Support mixing cutedsl and deepgemm backend (#11807) 2025-10-21 07:38:35 +08:00
penguin_wwy
184a4df697 Replace function call with set literal (#11867) 2025-10-21 01:39:16 +08:00
Qiaolin Yu
f7b1d8c5ab Fix acc len and gen throughput metrics when enabling overlap-spec (#11823)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-10-21 01:34:38 +08:00
Cheng Wan
bfc3b3f786 [9/N] MoE Refactor: cleanup dispatcher interfaces (#11847) 2025-10-20 10:11:46 -07:00
Liangsheng Yin
da5bde4d16 Tiny fix main lint (#11862) 2025-10-20 19:57:24 +08:00
DarkSharpness
276e7b3e4e [Feature] New structural tag support (#10691) 2025-10-20 18:25:58 +08:00
ishandhanani
296f689242 fix(server_args): handle tokenizer init conflicts (#11776) 2025-10-20 00:27:19 -07:00
Sai Enduri
9edb7b5123 [AMD CI] Populate image cache in nightly docker release. (#11822) 2025-10-20 00:04:04 -07:00
Sai Enduri
e53bf44243 Update amd gpu install docs. (#11849) 2025-10-20 00:03:26 -07:00
Shane A
d383e6616e [Model] Add Olmo 3 model support (#11396) 2025-10-19 23:59:16 -07:00
Xiaoyu Zhang
984fbeb16b Revert "[CI Monitor] Ci monitor only deal with main branch in default" (#11846) 2025-10-19 22:06:40 -07:00