cctry
|
b0b4f71679
|
[Fix] memory leak by overlap + retract (#11981)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-10-23 22:59:23 +08:00 |
|
Liangsheng Yin
|
32852fe9e9
|
Move memory runtime checker to mixin class (#12014)
|
2025-10-23 20:53:26 +08:00 |
|
Zhengke Zhou
|
260fe755b6
|
Simplify multi-tokenizer (#11295)
Signed-off-by: zhengkezhou1 <madzhou1@gmail.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-10-21 16:33:29 +08:00 |
|
Lianmin Zheng
|
01f14a7ad2
|
[code move] move pp into a separate mixin (#11838)
|
2025-10-20 18:46:56 -07:00 |
|
Lianmin Zheng
|
43ad05907c
|
[Auto Sync] Update scheduler.py, server_args.py (20251020) (#11875)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
|
2025-10-20 17:41:19 -07:00 |
|
Zilin Zhu
|
e68a2b5b2f
|
[RL] use cpu group to prepare_mlp_sync_batch_raw when the server is offloaded (#10152)
|
2025-10-18 14:29:35 +08:00 |
|
Yineng Zhang
|
b79f75fd53
|
[Auto Sync] Update scheduler.py (20251017) (#11738)
|
2025-10-17 12:36:07 -07:00 |
|
Liangsheng Yin
|
cde5a6e30f
|
Abstraction for spec worker and code cleanup (#11643)
|
2025-10-17 23:31:36 +08:00 |
|
Baizhou Zhang
|
b0d1d717e1
|
Revert "make radix cache deterministic" (#11728)
|
2025-10-16 14:36:15 -07:00 |
|
Shangming Cai
|
868403f642
|
[PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
|
2025-10-15 18:59:14 -07:00 |
|
Lianmin Zheng
|
27d710457c
|
[Auto Sync] Update scheduler.py, server_args.py (20251014) (#11623)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-10-14 13:20:03 -07:00 |
|
Alex Chi Z
|
dc965db0e0
|
make radix cache deterministic (#10721)
Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
|
2025-10-14 21:01:52 +08:00 |
|
Liangsheng Yin
|
bfadb5ea5f
|
Adjust overlap event loop (#11507)
|
2025-10-14 00:33:19 +08:00 |
|
Liangsheng Yin
|
54a46a264d
|
Remove tp_worker.worker (#11548)
|
2025-10-13 22:38:48 +08:00 |
|
Liangsheng Yin
|
516738b096
|
Depreate global_server_args_dict (#11528)
|
2025-10-13 19:34:43 +08:00 |
|
Yi Zhang
|
a55cf5304a
|
[Feature] Support mamba radix cache v0 (#11214)
Co-authored-by: hanming-lu <hanming@x.ai>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: thalahors <ericalcaide1@gmail.com>
|
2025-10-12 20:57:15 -07:00 |
|
Cheng Wan
|
1bdd010291
|
Revert "Deprecate global_server_args_dict" (#11520)
|
2025-10-12 17:40:40 -07:00 |
|
Liangsheng Yin
|
1083e7e3df
|
Deprecate global_server_args_dict (#11331)
|
2025-10-13 01:20:47 +08:00 |
|
Liangsheng Yin
|
f49419061d
|
Move args from global_config to environ (#11332)
|
2025-10-12 21:29:31 +08:00 |
|
Liangsheng Yin
|
20a6c0a63d
|
Beta spec-overlap for EAGLE (#11398)
Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-10-12 11:02:22 +08:00 |
|
hzh0425
|
ee3bd8a1c8
|
feat(hicache): Support passing prefix keys for l3 store. (#9045)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-10-10 00:22:05 -07:00 |
|
Yingchun Lai
|
0fe87213bb
|
fix: fix gpu-proc affinity set incorrectly when pp_size > 1 (#11389)
|
2025-10-09 18:40:05 -07:00 |
|
Xinyuan Tong
|
1f106ee365
|
[grammar] Avoid server crash when grammar backend is None (#11401)
|
2025-10-09 18:38:10 -07:00 |
|
Lianmin Zheng
|
9b8ebb2798
|
move more files under srt/utils (#11285)
|
2025-10-09 16:46:15 -07:00 |
|
Yineng Zhang
|
e22b13c569
|
[Auto Sync] Update scheduler.py (20251009) (#11350)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Junxiong Wang <junxiong@together.ai>
|
2025-10-08 17:39:04 -07:00 |
|
Netanel Haber
|
d6837aea4d
|
model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-10-09 00:37:38 +08:00 |
|
Liangsheng Yin
|
3ddd7dc9f8
|
Introduce future indices (#11301)
|
2025-10-07 22:24:02 +08:00 |
|
Liangsheng Yin
|
501dfa6b42
|
Remove sampling info events and overlap thread file (#11300)
|
2025-10-07 21:34:25 +08:00 |
|
Liangsheng Yin
|
1519a89cfd
|
Remove overlap thread (#11210)
Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-10-07 20:12:12 +08:00 |
|
Ke Bao
|
24bc3fb0f9
|
EAGLE cache fix for SWARadixCache (#11231)
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-10-07 18:21:37 +08:00 |
|
Lianmin Zheng
|
708f4ff490
|
Rename max_micro_batch_size -> pp_max_micro_batch_size (#11279)
|
2025-10-06 15:50:56 -07:00 |
|
Ke Bao
|
31b49c0b51
|
EAGLE cache fix for HiCache (#11215)
|
2025-10-04 16:53:53 -07:00 |
|
narutolhy
|
c61b9a1d01
|
fix self.enable_kv_cache_events (#11178)
|
2025-10-03 14:09:41 -07:00 |
|
Shangming Cai
|
2c7f4ca2f2
|
Optimize debug log position of PD abort request (#11090)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-03 23:07:02 +08:00 |
|
fzyzcjy
|
fdc4e1e570
|
Tiny move files to utils folder (#11166)
|
2025-10-03 22:40:06 +08:00 |
|
Liangsheng Yin
|
3c699772c9
|
Introduce naming convention in io_struct and base sglang io classes. (#10133)
|
2025-10-03 10:55:13 +08:00 |
|
Liangsheng Yin
|
7ff740a6ce
|
Remove dp balance metadata and minimul token balance. (#11170)
|
2025-10-03 01:48:15 +08:00 |
|
Liangsheng Yin
|
458611de77
|
Unify forward output datastructure (#11124)
|
2025-10-03 00:28:57 +08:00 |
|
Lianmin Zheng
|
2d62af6be5
|
Fix metrics and request tracing (TimeStats) (#11123)
|
2025-10-01 13:03:07 -07:00 |
|
Ke Bao
|
91847e382a
|
Fix eagle radix cache (#10846)
|
2025-09-30 22:59:20 +08:00 |
|
Zhihao Zhang
|
24f7cb1ece
|
[speculative decoding] rename lookahead to ngram (#11010)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
|
2025-09-28 21:06:59 -07:00 |
|
Shangming Cai
|
e23e280e16
|
Add support for topk metadata transferring for PD (#10616)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-28 00:09:38 +08:00 |
|
hzh0425
|
7ec5b4e89c
|
[PD-HiCache]: Support Async Offloading KVCache In Decode Side (#10192)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-25 23:20:49 -07:00 |
|
Lianmin Zheng
|
32d893730f
|
Revert "[fix][pd-disag]no need set next batch sampling info done in prefill" (#10828)
|
2025-09-23 17:02:01 -07:00 |
|
Jimmy
|
4b5ef3002c
|
[fix][pd-disag]no need set next batch sampling info done in prefill (#10259)
|
2025-09-24 01:24:36 +08:00 |
|
ishandhanani
|
662393f27d
|
fix: kv events with tp > 1 (#10541)
|
2025-09-22 15:55:44 -07:00 |
|
Ethan (Yusheng) Su
|
134b4f7ec2
|
Support deterministic inference with triton backend (#10694)
|
2025-09-22 09:20:40 +08:00 |
|
Xinyuan Tong
|
12d6cf18f0
|
Refactors radix cache for extra key support (#10317)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-09-22 02:16:16 +08:00 |
|
Baizhou Zhang
|
8ecef73f12
|
[1/2] Support deterministic inference with flashinfer attention backend (#10645)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2025-09-19 23:34:29 -07:00 |
|
Zhihao Zhang
|
e7bc600304
|
[Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2025-09-18 16:42:41 -07:00 |
|