Commit Graph

439 Commits

Author SHA1 Message Date
cctry
b0b4f71679 [Fix] memory leak by overlap + retract (#11981)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-10-23 22:59:23 +08:00
Liangsheng Yin
32852fe9e9 Move memory runtime checker to mixin class (#12014) 2025-10-23 20:53:26 +08:00
Zhengke Zhou
260fe755b6 Simplify multi-tokenizer (#11295)
Signed-off-by: zhengkezhou1 <madzhou1@gmail.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-10-21 16:33:29 +08:00
Lianmin Zheng
01f14a7ad2 [code move] move pp into a separate mixin (#11838) 2025-10-20 18:46:56 -07:00
Lianmin Zheng
43ad05907c [Auto Sync] Update scheduler.py, server_args.py (20251020) (#11875)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
2025-10-20 17:41:19 -07:00
Zilin Zhu
e68a2b5b2f [RL] use cpu group to prepare_mlp_sync_batch_raw when the server is offloaded (#10152) 2025-10-18 14:29:35 +08:00
Yineng Zhang
b79f75fd53 [Auto Sync] Update scheduler.py (20251017) (#11738) 2025-10-17 12:36:07 -07:00
Liangsheng Yin
cde5a6e30f Abstraction for spec worker and code cleanup (#11643) 2025-10-17 23:31:36 +08:00
Baizhou Zhang
b0d1d717e1 Revert "make radix cache deterministic" (#11728) 2025-10-16 14:36:15 -07:00
Shangming Cai
868403f642 [PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
2025-10-15 18:59:14 -07:00
Lianmin Zheng
27d710457c [Auto Sync] Update scheduler.py, server_args.py (20251014) (#11623)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-10-14 13:20:03 -07:00
Alex Chi Z
dc965db0e0 make radix cache deterministic (#10721)
Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
2025-10-14 21:01:52 +08:00
Liangsheng Yin
bfadb5ea5f Adjust overlap event loop (#11507) 2025-10-14 00:33:19 +08:00
Liangsheng Yin
54a46a264d Remove tp_worker.worker (#11548) 2025-10-13 22:38:48 +08:00
Liangsheng Yin
516738b096 Depreate global_server_args_dict (#11528) 2025-10-13 19:34:43 +08:00
Yi Zhang
a55cf5304a [Feature] Support mamba radix cache v0 (#11214)
Co-authored-by: hanming-lu <hanming@x.ai>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: thalahors <ericalcaide1@gmail.com>
2025-10-12 20:57:15 -07:00
Cheng Wan
1bdd010291 Revert "Deprecate global_server_args_dict" (#11520) 2025-10-12 17:40:40 -07:00
Liangsheng Yin
1083e7e3df Deprecate global_server_args_dict (#11331) 2025-10-13 01:20:47 +08:00
Liangsheng Yin
f49419061d Move args from global_config to environ (#11332) 2025-10-12 21:29:31 +08:00
Liangsheng Yin
20a6c0a63d Beta spec-overlap for EAGLE (#11398)
Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
2025-10-12 11:02:22 +08:00
hzh0425
ee3bd8a1c8 feat(hicache): Support passing prefix keys for l3 store. (#9045)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-10-10 00:22:05 -07:00
Yingchun Lai
0fe87213bb fix: fix gpu-proc affinity set incorrectly when pp_size > 1 (#11389) 2025-10-09 18:40:05 -07:00
Xinyuan Tong
1f106ee365 [grammar] Avoid server crash when grammar backend is None (#11401) 2025-10-09 18:38:10 -07:00
Lianmin Zheng
9b8ebb2798 move more files under srt/utils (#11285) 2025-10-09 16:46:15 -07:00
Yineng Zhang
e22b13c569 [Auto Sync] Update scheduler.py (20251009) (#11350)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Junxiong Wang <junxiong@together.ai>
2025-10-08 17:39:04 -07:00
Netanel Haber
d6837aea4d model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-10-09 00:37:38 +08:00
Liangsheng Yin
3ddd7dc9f8 Introduce future indices (#11301) 2025-10-07 22:24:02 +08:00
Liangsheng Yin
501dfa6b42 Remove sampling info events and overlap thread file (#11300) 2025-10-07 21:34:25 +08:00
Liangsheng Yin
1519a89cfd Remove overlap thread (#11210)
Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
2025-10-07 20:12:12 +08:00
Ke Bao
24bc3fb0f9 EAGLE cache fix for SWARadixCache (#11231)
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
2025-10-07 18:21:37 +08:00
Lianmin Zheng
708f4ff490 Rename max_micro_batch_size -> pp_max_micro_batch_size (#11279) 2025-10-06 15:50:56 -07:00
Ke Bao
31b49c0b51 EAGLE cache fix for HiCache (#11215) 2025-10-04 16:53:53 -07:00
narutolhy
c61b9a1d01 fix self.enable_kv_cache_events (#11178) 2025-10-03 14:09:41 -07:00
Shangming Cai
2c7f4ca2f2 Optimize debug log position of PD abort request (#11090)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-03 23:07:02 +08:00
fzyzcjy
fdc4e1e570 Tiny move files to utils folder (#11166) 2025-10-03 22:40:06 +08:00
Liangsheng Yin
3c699772c9 Introduce naming convention in io_struct and base sglang io classes. (#10133) 2025-10-03 10:55:13 +08:00
Liangsheng Yin
7ff740a6ce Remove dp balance metadata and minimul token balance. (#11170) 2025-10-03 01:48:15 +08:00
Liangsheng Yin
458611de77 Unify forward output datastructure (#11124) 2025-10-03 00:28:57 +08:00
Lianmin Zheng
2d62af6be5 Fix metrics and request tracing (TimeStats) (#11123) 2025-10-01 13:03:07 -07:00
Ke Bao
91847e382a Fix eagle radix cache (#10846) 2025-09-30 22:59:20 +08:00
Zhihao Zhang
24f7cb1ece [speculative decoding] rename lookahead to ngram (#11010)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
2025-09-28 21:06:59 -07:00
Shangming Cai
e23e280e16 Add support for topk metadata transferring for PD (#10616)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-09-28 00:09:38 +08:00
hzh0425
7ec5b4e89c [PD-HiCache]: Support Async Offloading KVCache In Decode Side (#10192)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-09-25 23:20:49 -07:00
Lianmin Zheng
32d893730f Revert "[fix][pd-disag]no need set next batch sampling info done in prefill" (#10828) 2025-09-23 17:02:01 -07:00
Jimmy
4b5ef3002c [fix][pd-disag]no need set next batch sampling info done in prefill (#10259) 2025-09-24 01:24:36 +08:00
ishandhanani
662393f27d fix: kv events with tp > 1 (#10541) 2025-09-22 15:55:44 -07:00
Ethan (Yusheng) Su
134b4f7ec2 Support deterministic inference with triton backend (#10694) 2025-09-22 09:20:40 +08:00
Xinyuan Tong
12d6cf18f0 Refactors radix cache for extra key support (#10317)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-09-22 02:16:16 +08:00
Baizhou Zhang
8ecef73f12 [1/2] Support deterministic inference with flashinfer attention backend (#10645)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-09-19 23:34:29 -07:00
Zhihao Zhang
e7bc600304 [Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-09-18 16:42:41 -07:00