Shangming Cai
|
868403f642
|
[PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
|
2025-10-15 18:59:14 -07:00 |
|
Hanming Lu
|
97d857c096
|
[Mamba] Increase default mamba_full_memory_ratio to 0.9 (#11679)
|
2025-10-16 09:56:43 +08:00 |
|
Lianmin Zheng
|
cd7e1bd591
|
Sync code and test CI; rename some env vars (#11686)
|
2025-10-15 18:37:03 -07:00 |
|
Huaiyu, Zheng
|
729b7edf72
|
enable rmsnorm on XPU (#10248)
|
2025-10-15 17:54:18 -07:00 |
|
DiweiSun
|
4c03dbaaef
|
[CI][XPU]enable sglang CI on Intel XPU (#9493)
Co-authored-by: huaiyuzh <huaiyu.zheng@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-10-15 17:13:19 -07:00 |
|
sglang-bot
|
baf277a9bf
|
chore: bump SGLang version to 0.5.3.post2 (#11680)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-15 16:49:14 -07:00 |
|
Chang Su
|
f226d3da2a
|
Fix missing json imports in serving_responses.py (#11681)
|
2025-10-15 13:01:55 -07:00 |
|
Chang Su
|
30ea4c462b
|
[tool call] Fix prev_tool_call_arr management in base_format_detector.py (#11367)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-10-15 09:51:51 -07:00 |
|
Shangming Cai
|
6d0364681c
|
Fix 1-step draft model forward (#11653)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-10-15 19:11:33 +08:00 |
|
Liangsheng Yin
|
8221f9ae8b
|
Tiny cleanup some eagle unused codes (#11660)
|
2025-10-15 17:24:08 +08:00 |
|
Stefan He
|
6b143d62a2
|
Clean up some Qwen3-Next and deterministic code (#11585)
|
2025-10-15 15:19:37 +08:00 |
|
Zheng Wengang
|
b2c8566920
|
[BugFix][Qwen3-VL]: fix cu_seqlens in qwen3-vl (#11458)
|
2025-10-14 22:16:49 -07:00 |
|
Yineng Zhang
|
91fc5bb5a9
|
feat: add add_chunked_prefix_cache_attention_backend (#11636)
|
2025-10-14 21:48:13 -07:00 |
|
Lifu Huang
|
780fbf2f38
|
[Fix] Fix accuracy bug in CSGMV kernel caching key. (#11579)
|
2025-10-14 20:25:56 -07:00 |
|
Jinwu
|
825432fce6
|
[1/N]Support DeepSeek-R1 w4a8 normal deepep (#8247)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
|
2025-10-14 20:10:53 -07:00 |
|
Xun Sun
|
a40229f6f8
|
[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-14 19:40:54 -07:00 |
|
Sahithi Chigurupati
|
e9e120ac7a
|
fix: upgrade transformers to 4.57.1 (#11628)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-10-14 18:35:05 -07:00 |
|
cctry
|
1d7f783501
|
Refactor kv cache free (#11351)
|
2025-10-14 17:45:19 -07:00 |
|
Simo Lin
|
325951460f
|
[router][grpc] add warm up to grpc server (#11627)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-10-14 16:11:16 -07:00 |
|
DarkSharpness
|
e28c9e526f
|
[Minor] Update xgrammar dependency (#11622)
|
2025-10-14 13:46:50 -07:00 |
|
Lianmin Zheng
|
b98cf39866
|
[Auto Sync] Update collector.py (20251014) (#11625)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2025-10-14 13:34:33 -07:00 |
|
Lianmin Zheng
|
27d710457c
|
[Auto Sync] Update scheduler.py, server_args.py (20251014) (#11623)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-10-14 13:20:03 -07:00 |
|
Baizhou Zhang
|
c224a4c6cc
|
Fix log for chunked prefix cache (#11624)
|
2025-10-14 11:49:33 -07:00 |
|
strgrb
|
94d26d850d
|
use non_blocking h2d in ForwardBatch.prepare_mlp_sync_batch. (#11605)
|
2025-10-14 11:30:59 -07:00 |
|
Liangsheng Yin
|
5ea96ac7cc
|
Reduce one step decode for draft model. (#11561)
|
2025-10-14 23:52:04 +08:00 |
|
yinghui
|
56222658ec
|
move eagle draft post process to cuda graph (#11434)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-10-14 22:50:53 +08:00 |
|
Alex Chi Z
|
dc965db0e0
|
make radix cache deterministic (#10721)
Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
|
2025-10-14 21:01:52 +08:00 |
|
Scott Lee
|
817e46f412
|
Refactor spec decoding metrics calculation into separate TokenizerManager utility function (#11586)
|
2025-10-14 20:45:49 +08:00 |
|
Liangsheng Yin
|
5a33c3aae7
|
Optimize Triton Draft Backend (#11556)
|
2025-10-14 20:08:32 +08:00 |
|
Qiaolin Yu
|
e4358a4585
|
Add fused_moe_triton config: triton_3_4_0/E=256,N=256,device_name=NVIDIA_B200.json (#11587)
|
2025-10-14 13:24:43 +08:00 |
|
Lianmin Zheng
|
ba2ce28fe9
|
[Auto Sync] Update model_config.py (20251014) (#11580)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-10-13 22:16:34 -07:00 |
|
Chenxi Li
|
28f80b1244
|
Implement LRU eviction policy for LoRA adapters (#11041)
|
2025-10-13 20:18:25 -07:00 |
|
Xiaoyu Zhang
|
88a6f9dab5
|
bench_serving support PD Disaggregation (#11542)
|
2025-10-13 19:43:26 -07:00 |
|
fzyzcjy
|
cb8ed2c09a
|
Make DeepEP combine recv do not overlap (#11535)
|
2025-10-13 18:40:42 -07:00 |
|
Trevor Morris
|
384733639a
|
[DSv32] Use torch.compile for _get_logits_head_gate (#11565)
|
2025-10-13 18:38:39 -07:00 |
|
Neelabh Sinha
|
aaf7af1b17
|
[FEATURE] Add Profile Trace Merger for Distributed Traces (#11413)
|
2025-10-14 09:20:17 +08:00 |
|
Yuwei An
|
932e263725
|
Compilation Folder Reset (#11539)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
|
2025-10-14 09:19:12 +08:00 |
|
Qiaolin Yu
|
43f80884c5
|
Fix accept rate in speculative decoding metrics (#11572)
|
2025-10-13 16:35:50 -07:00 |
|
Chang Su
|
887c2b4575
|
[router][grpc] Add serve_grpc to launch_server and log id for HealthCheck (#11564)
|
2025-10-13 16:07:19 -07:00 |
|
fzyzcjy
|
065ce81574
|
Tiny cleanup fp4 gemm calls (#11537)
|
2025-10-13 14:48:22 -07:00 |
|
Johnny
|
cb8f3d90d3
|
[NVIDIA] update pyproject.toml to support cu130 option (#11521)
|
2025-10-13 13:03:31 -07:00 |
|
Trevor Morris
|
c9cff2b984
|
Fix DeepSeek-v3.2 default config (ValueError: not enough values to unpack (expected 4, got 3)) (#11557)
|
2025-10-13 11:27:40 -07:00 |
|
Scott Lee
|
b6fb5d7666
|
Add metrics for speculative decoding (acceptance rate, average acceptance length) (#11441)
|
2025-10-13 11:24:27 -07:00 |
|
Lianmin Zheng
|
5e3f7e7fa9
|
Minor: improve sampler & remove unused fields from model_config.py (#11531)
|
2025-10-13 11:04:44 -07:00 |
|
Liangsheng Yin
|
acc2327bbd
|
Move deep gemm related arguments to sglang.srt.environ (#11547)
|
2025-10-14 00:34:35 +08:00 |
|
Liangsheng Yin
|
bfadb5ea5f
|
Adjust overlap event loop (#11507)
|
2025-10-14 00:33:19 +08:00 |
|
ai-jz
|
9cc1e065f1
|
[router][Fix] Include grpc reflection runtime dependency (#11419)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-10-13 09:32:42 -07:00 |
|
Mick
|
f35f120d70
|
fix: fix video input for qwen3-vl (#11442)
|
2025-10-13 09:30:43 -07:00 |
|
Liangsheng Yin
|
54a46a264d
|
Remove tp_worker.worker (#11548)
|
2025-10-13 22:38:48 +08:00 |
|
Mohammad Miadh Angkad
|
c7867b6702
|
[Fix] Add per_channel_quant parameter to MoE config functions (#11201)
|
2025-10-13 21:26:06 +08:00 |
|