Yuanhang Sun
|
19ba16aa3d
|
[Fix]: add missing device attribute to ChunkCache (#11493)
|
2025-10-12 20:49:59 -07:00 |
|
Qiaolin Yu
|
a2b3d9b90b
|
Update DeepSeek-R1-FP4 default config on blackwell (#11512)
|
2025-10-12 20:32:11 -07:00 |
|
Yongtong Wu
|
a20e7df8d0
|
Improve dp attention port assignment scheme (#5889)
Co-authored-by: Cheng Wan <cwan@x.ai>
|
2025-10-12 17:55:59 -07:00 |
|
Cheng Wan
|
1bdd010291
|
Revert "Deprecate global_server_args_dict" (#11520)
|
2025-10-12 17:40:40 -07:00 |
|
Lianmin Zheng
|
2ac46e94ef
|
Sync changes on io_struct.py and deterministic ops (#11498)
|
2025-10-12 16:03:10 -07:00 |
|
Binyao Jiang
|
0aa65f94f1
|
[Fix] Improve longbench prompt and other logics (#11474)
|
2025-10-12 15:04:28 -07:00 |
|
Liangsheng Yin
|
1083e7e3df
|
Deprecate global_server_args_dict (#11331)
|
2025-10-13 01:20:47 +08:00 |
|
hzh0425
|
f5b34a510c
|
Bugfix: Fix Type consistency for KV indices in SWARadixCache (#11452)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-10-12 23:19:44 +08:00 |
|
Lianmin Zheng
|
548a57b1f3
|
Fix port conflicts in CI (#11497)
|
2025-10-12 06:46:36 -07:00 |
|
Yi Zhang
|
4b15fa00f0
|
move fla env check position (#11500)
|
2025-10-12 06:40:45 -07:00 |
|
Liangsheng Yin
|
f49419061d
|
Move args from global_config to environ (#11332)
|
2025-10-12 21:29:31 +08:00 |
|
Liangsheng Yin
|
01e59e8247
|
Fix CI break by express-laned PRs. (#11499)
|
2025-10-12 21:06:06 +08:00 |
|
Mike Qiu
|
99a0704a36
|
bailingMoE: Fix Key error of deepep_mode (#11465)
Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
|
2025-10-12 20:42:59 +08:00 |
|
Antoine Roux
|
ec1cd90ac9
|
Fix the GPT function calling regex to allow dash in the name (#10577)
|
2025-10-12 20:34:58 +08:00 |
|
Kai-Hsun Chen
|
1103dc6204
|
[chore][2/N] Avoid using default mutable parameters (#11479)
Signed-off-by: Kai-Hsun Chen <khchen@x.ai>
|
2025-10-12 20:34:04 +08:00 |
|
Vincent Zhong
|
a220536f40
|
[ perf ] Replace json-> orjson in hot path (#11221)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
|
2025-10-12 20:30:58 +08:00 |
|
Mahmoud Ashraf
|
7b064f04f8
|
[bugfix]: use correct causality condition for flashattention, flashinfer, and triton backends (#10172)
|
2025-10-12 20:28:16 +08:00 |
|
Kai-Hsun Chen
|
43190becfa
|
[chore][1/N] Avoid using default mutable parameters (#11478)
Signed-off-by: Kai-Hsun Chen <khchen@x.ai>
|
2025-10-12 20:26:39 +08:00 |
|
Vincent Zhong
|
be740acdb0
|
[smol] [perf] Qwen3-VL in place op. (#11481)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
|
2025-10-12 20:25:30 +08:00 |
|
Yuwei An
|
4ac8e09df0
|
Piecewise CUDA Graph Support & Torch Compile Backend (#10062)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
|
2025-10-12 11:55:57 +08:00 |
|
Liangsheng Yin
|
20a6c0a63d
|
Beta spec-overlap for EAGLE (#11398)
Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-10-12 11:02:22 +08:00 |
|
Glen Liu
|
47c606d3dc
|
[Feature] support regex strings as a stopping condition (#10635)
|
2025-10-12 10:53:15 +08:00 |
|
Lorenzo Lu
|
b5dcfd4154
|
Add option to disable any_whitespace for xgrammar and llguidance backends. (#8919)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-10-11 22:24:58 +08:00 |
|
ybyang
|
5061b8fd3e
|
fix stop when stream (#11462)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2025-10-11 22:06:31 +08:00 |
|
ykcombat
|
c8452551ce
|
[Fix] Fix split prefill with fa3. (#11428)
|
2025-10-11 22:03:28 +08:00 |
|
fzyzcjy
|
bf3e7149be
|
Fix enable_v2 in int8 quant (#11470)
|
2025-10-11 21:56:30 +08:00 |
|
ykcombat
|
f5754d1256
|
[Documentation][Configuration] Server args and documentation of PD-Multiplexing. (#11427)
|
2025-10-11 21:36:07 +08:00 |
|
Liangsheng Yin
|
739daa63e4
|
Adjust logits metada init for target verify (#11467)
|
2025-10-11 21:17:04 +08:00 |
|
fzyzcjy
|
21337b22b9
|
Reland [1/2] Optimizations and refactors about quant kernel (#10312)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-10-11 15:59:03 +08:00 |
|
Zhiyu
|
129d299278
|
Enable native ModelOpt quantization support (2/3) (#9991)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-10-11 07:48:14 +00:00 |
|
Binyao Jiang
|
451d15c44b
|
[DPSKv3.2] Rewrite nsa tilelang act_quant kernel to triton (#11450)
|
2025-10-10 23:13:46 -07:00 |
|
Liu-congo
|
c80a96dae9
|
[BugFix] test_mla_fp8.py fails on Cublas 12.9 (#11360)
Signed-off-by: Liu-congo <1502632128@qq.com>
|
2025-10-10 21:14:24 -07:00 |
|
Stefan He
|
eae9a9fb9d
|
Fix batch invariant ops (#11368)
|
2025-10-10 20:49:08 -07:00 |
|
wxsm
|
2674c1d280
|
fix: Change dsv32 hack temporary path to use system temp directory (#11445)
|
2025-10-10 19:59:41 -07:00 |
|
Lianmin Zheng
|
61055cb309
|
Reorder PD disagg CI tests (#11438)
|
2025-10-10 17:56:49 -07:00 |
|
Simo Lin
|
c495833186
|
[router] leverage RAII to actively cancel request during client disconnect (#11399)
|
2025-10-10 20:43:38 -04:00 |
|
cctry
|
b36afed4a7
|
Separate allocation logic from scheduler (#11313)
|
2025-10-10 17:38:54 -07:00 |
|
JinYan Su
|
9aa4502d11
|
feat(mooncake): support GB suffix for global_segment_size (#10745)
Signed-off-by: Jinyang Su <751080330@qq.com>
Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com>
|
2025-10-10 17:38:25 -07:00 |
|
Scott Lee
|
55b14656e6
|
Revert "Add metrics for speculative decoding (acceptance rate, average acceptance length)" (#11433)
|
2025-10-10 12:54:57 -07:00 |
|
Lianmin Zheng
|
b4408e6098
|
Revert "fix: fix video input for qwen3-vl" (#11437)
|
2025-10-10 12:44:40 -07:00 |
|
Cheng Wan
|
52fcbbb8bd
|
Revert "perf: optimize qwen-vl with symm mem allreduce" (#11436)
|
2025-10-10 12:30:05 -07:00 |
|
Teng Ma
|
9082a7d323
|
[HiCache] feat: add multi tenant with prefix tag (#9256)
|
2025-10-11 00:23:28 +08:00 |
|
Yuan Luo
|
3b9d97f335
|
perf: optimize qwen-vl with symm mem allreduce (#11381)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-10-10 22:24:45 +08:00 |
|
Mick
|
a1a20b4c7c
|
fix: fix video input for qwen3-vl (#11361)
|
2025-10-10 04:35:35 -07:00 |
|
Yineng Zhang
|
4299aebdbb
|
chore: update pyproject (#11420)
|
2025-10-10 00:56:30 -07:00 |
|
Scott Lee
|
0babd48736
|
Add metrics for speculative decoding (acceptance rate, average acceptance length) (#11144)
|
2025-10-10 00:46:44 -07:00 |
|
Zaili Wang
|
f19613e6c3
|
Dedicated toml files for CPU/XPU (#10734)
|
2025-10-10 00:44:55 -07:00 |
|
ziruiliu
|
8df4945559
|
fix file and object naming scheme in HiCacheNixl to avoid data corruption (#10969)
Signed-off-by: Zirui Liu <ziliu@ddn.com>
|
2025-10-10 00:23:10 -07:00 |
|
hzh0425
|
ee3bd8a1c8
|
feat(hicache): Support passing prefix keys for l3 store. (#9045)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-10-10 00:22:05 -07:00 |
|
Yuan Luo
|
b5044fbf12
|
Replace pad with cat for better performance (#11388)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-10-10 12:03:17 +08:00 |
|