Commit Graph

3980 Commits

Author SHA1 Message Date
Lifu Huang
780fbf2f38 [Fix] Fix accuracy bug in CSGMV kernel caching key. (#11579) 2025-10-14 20:25:56 -07:00
Jinwu
825432fce6 [1/N]Support DeepSeek-R1 w4a8 normal deepep (#8247)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
2025-10-14 20:10:53 -07:00
Xun Sun
a40229f6f8 [1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-10-14 19:40:54 -07:00
Sahithi Chigurupati
e9e120ac7a fix: upgrade transformers to 4.57.1 (#11628)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>
2025-10-14 18:35:05 -07:00
cctry
1d7f783501 Refactor kv cache free (#11351) 2025-10-14 17:45:19 -07:00
Simo Lin
325951460f [router][grpc] add warm up to grpc server (#11627)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-10-14 16:11:16 -07:00
DarkSharpness
e28c9e526f [Minor] Update xgrammar dependency (#11622) 2025-10-14 13:46:50 -07:00
Lianmin Zheng
b98cf39866 [Auto Sync] Update collector.py (20251014) (#11625)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2025-10-14 13:34:33 -07:00
Lianmin Zheng
27d710457c [Auto Sync] Update scheduler.py, server_args.py (20251014) (#11623)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-10-14 13:20:03 -07:00
Baizhou Zhang
c224a4c6cc Fix log for chunked prefix cache (#11624) 2025-10-14 11:49:33 -07:00
strgrb
94d26d850d use non_blocking h2d in ForwardBatch.prepare_mlp_sync_batch. (#11605) 2025-10-14 11:30:59 -07:00
Liangsheng Yin
5ea96ac7cc Reduce one step decode for draft model. (#11561) 2025-10-14 23:52:04 +08:00
yinghui
56222658ec move eagle draft post process to cuda graph (#11434)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-10-14 22:50:53 +08:00
Alex Chi Z
dc965db0e0 make radix cache deterministic (#10721)
Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
2025-10-14 21:01:52 +08:00
Scott Lee
817e46f412 Refactor spec decoding metrics calculation into separate TokenizerManager utility function (#11586) 2025-10-14 20:45:49 +08:00
Liangsheng Yin
5a33c3aae7 Optimize Triton Draft Backend (#11556) 2025-10-14 20:08:32 +08:00
Qiaolin Yu
e4358a4585 Add fused_moe_triton config: triton_3_4_0/E=256,N=256,device_name=NVIDIA_B200.json (#11587) 2025-10-14 13:24:43 +08:00
Lianmin Zheng
ba2ce28fe9 [Auto Sync] Update model_config.py (20251014) (#11580)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
2025-10-13 22:16:34 -07:00
Chenxi Li
28f80b1244 Implement LRU eviction policy for LoRA adapters (#11041) 2025-10-13 20:18:25 -07:00
Xiaoyu Zhang
88a6f9dab5 bench_serving support PD Disaggregation (#11542) 2025-10-13 19:43:26 -07:00
fzyzcjy
cb8ed2c09a Make DeepEP combine recv do not overlap (#11535) 2025-10-13 18:40:42 -07:00
Trevor Morris
384733639a [DSv32] Use torch.compile for _get_logits_head_gate (#11565) 2025-10-13 18:38:39 -07:00
Neelabh Sinha
aaf7af1b17 [FEATURE] Add Profile Trace Merger for Distributed Traces (#11413) 2025-10-14 09:20:17 +08:00
Yuwei An
932e263725 Compilation Folder Reset (#11539)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
2025-10-14 09:19:12 +08:00
Qiaolin Yu
43f80884c5 Fix accept rate in speculative decoding metrics (#11572) 2025-10-13 16:35:50 -07:00
Chang Su
887c2b4575 [router][grpc] Add serve_grpc to launch_server and log id for HealthCheck (#11564) 2025-10-13 16:07:19 -07:00
fzyzcjy
065ce81574 Tiny cleanup fp4 gemm calls (#11537) 2025-10-13 14:48:22 -07:00
Johnny
cb8f3d90d3 [NVIDIA] update pyproject.toml to support cu130 option (#11521) 2025-10-13 13:03:31 -07:00
Trevor Morris
c9cff2b984 Fix DeepSeek-v3.2 default config (ValueError: not enough values to unpack (expected 4, got 3)) (#11557) 2025-10-13 11:27:40 -07:00
Scott Lee
b6fb5d7666 Add metrics for speculative decoding (acceptance rate, average acceptance length) (#11441) 2025-10-13 11:24:27 -07:00
Lianmin Zheng
5e3f7e7fa9 Minor: improve sampler & remove unused fields from model_config.py (#11531) 2025-10-13 11:04:44 -07:00
Liangsheng Yin
acc2327bbd Move deep gemm related arguments to sglang.srt.environ (#11547) 2025-10-14 00:34:35 +08:00
Liangsheng Yin
bfadb5ea5f Adjust overlap event loop (#11507) 2025-10-14 00:33:19 +08:00
ai-jz
9cc1e065f1 [router][Fix] Include grpc reflection runtime dependency (#11419)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-10-13 09:32:42 -07:00
Mick
f35f120d70 fix: fix video input for qwen3-vl (#11442) 2025-10-13 09:30:43 -07:00
Liangsheng Yin
54a46a264d Remove tp_worker.worker (#11548) 2025-10-13 22:38:48 +08:00
Mohammad Miadh Angkad
c7867b6702 [Fix] Add per_channel_quant parameter to MoE config functions (#11201) 2025-10-13 21:26:06 +08:00
Liangsheng Yin
516738b096 Depreate global_server_args_dict (#11528) 2025-10-13 19:34:43 +08:00
Yuan Luo
0b6f535f66 [Reland] perf: optimize qwen-vl with symm mem allreduce (#11457)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-10-13 17:51:25 +08:00
hzh0425
318424e2c8 [HICache]: Support 3FS-Store with page_first_direct layout (#11460) 2025-10-13 15:47:22 +08:00
Mick
0c0779d667 ci: improve nightly-ci (#11385) 2025-10-12 21:19:34 -07:00
Yi Zhang
a55cf5304a [Feature] Support mamba radix cache v0 (#11214)
Co-authored-by: hanming-lu <hanming@x.ai>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: thalahors <ericalcaide1@gmail.com>
2025-10-12 20:57:15 -07:00
Yuanhang Sun
19ba16aa3d [Fix]: add missing device attribute to ChunkCache (#11493) 2025-10-12 20:49:59 -07:00
Qiaolin Yu
a2b3d9b90b Update DeepSeek-R1-FP4 default config on blackwell (#11512) 2025-10-12 20:32:11 -07:00
Yongtong Wu
a20e7df8d0 Improve dp attention port assignment scheme (#5889)
Co-authored-by: Cheng Wan <cwan@x.ai>
2025-10-12 17:55:59 -07:00
Cheng Wan
1bdd010291 Revert "Deprecate global_server_args_dict" (#11520) 2025-10-12 17:40:40 -07:00
Lianmin Zheng
2ac46e94ef Sync changes on io_struct.py and deterministic ops (#11498) 2025-10-12 16:03:10 -07:00
Binyao Jiang
0aa65f94f1 [Fix] Improve longbench prompt and other logics (#11474) 2025-10-12 15:04:28 -07:00
Liangsheng Yin
1083e7e3df Deprecate global_server_args_dict (#11331) 2025-10-13 01:20:47 +08:00
hzh0425
f5b34a510c Bugfix: Fix Type consistency for KV indices in SWARadixCache (#11452)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-10-12 23:19:44 +08:00