ykcombat
|
c4e81e64fb
|
[Feature] Use current greenctx stream to communicate in PD-Multiplexing. (#11594)
|
2025-10-20 10:58:20 +08:00 |
|
harrisonlimh
|
c726d44cc7
|
Recapture cuda graph after model weight update to resolve IMA error (#11780)
|
2025-10-20 10:50:03 +08:00 |
|
sglang-bot
|
283c8ba031
|
chore: bump sgl-kernel version to 0.3.16.post3 (#11733)
|
2025-10-19 21:44:15 -05:00 |
|
huangtingwei
|
cae3956585
|
check master server for mooncake store (#10510)
|
2025-10-20 09:37:09 +08:00 |
|
Kangyan-Zhou
|
27a223aba4
|
Improve Kernel Build Time (#11508)
|
2025-10-19 18:11:48 -07:00 |
|
Kangyan-Zhou
|
53529f46cc
|
Fix version bump script to handle TOML files with outdated versions (#11787)
Co-authored-by: Claude <noreply@anthropic.com>
|
2025-10-19 18:10:26 -07:00 |
|
Xiaoyu Zhang
|
24ed3f32c0
|
fix(ci): Fix CI Monitor limit parameter and add CI Analysis to summary (#11832)
|
2025-10-19 18:08:34 -07:00 |
|
Baizhou Zhang
|
44f0ece9fc
|
[Doc] Update documents for FA4 (#11778)
|
2025-10-19 17:40:38 -07:00 |
|
Liu-congo
|
be0058bc05
|
[BugFix] replace the input_to_float8 used in dsv2 (#11612)
Signed-off-by: Liu-congo <1502632128@qq.com>
|
2025-10-19 19:34:13 -05:00 |
|
fzyzcjy
|
9e3be1fa2a
|
Tiny bump DeepEP version in ARM blackwell (#11810)
|
2025-10-20 08:15:14 +08:00 |
|
fzyzcjy
|
a8ba32798e
|
Fix triton_kernels import error on some hardwares (#11831)
|
2025-10-20 08:14:47 +08:00 |
|
hlu1
|
3b80232d06
|
[DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-10-19 17:13:39 -07:00 |
|
Johnny
|
252dc4e112
|
[NVIDIA] FA3/FA4 Fix (#11606)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-19 17:10:10 -07:00 |
|
Baizhou Zhang
|
cbb5fc2edc
|
[CI] Add CI test for DeepSeek V3.2 MTP (#11835)
|
2025-10-19 17:00:25 -07:00 |
|
Night
|
53fb229f53
|
[logprobs] Enable local deterministic logrprobs testing with strict threshold (#10994)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-19 13:30:39 -07:00 |
|
Stefan He
|
4fff1ec1d9
|
Deterministic Mode: Add 1-stage triton kernel for prefill (#11147)
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Binyao Jiang <bijiang@linkedin.com>
|
2025-10-20 01:47:36 +08:00 |
|
Liangsheng Yin
|
7a020e0f3b
|
[Test] Add basic matched stop for beta eagle (#11833)
|
2025-10-20 01:17:00 +08:00 |
|
Liangsheng Yin
|
48738af7f9
|
[CI] always print back trace in retry() (#11834)
|
2025-10-20 01:12:49 +08:00 |
|
Paiiii
|
efa473348b
|
[Spec Decoding] Support MTP for dsv3.2 (#11652)
Co-authored-by: Paiiiiiiiiiiiiii <zengpai@baidu.com>
|
2025-10-19 23:44:22 +08:00 |
|
Liangsheng Yin
|
d658f0497e
|
[overlap-spec] fix stop condition and trimming (#11819)
|
2025-10-19 22:00:20 +08:00 |
|
Liangsheng Yin
|
57e25de756
|
Revert "Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads" (#11827)
|
2025-10-19 19:44:06 +08:00 |
|
fzyzcjy
|
12eb02e982
|
Change bf16 to fp8 for some gemms in attention for DeepSeek ckpt v2 (#11805)
|
2025-10-19 16:15:13 +08:00 |
|
fzyzcjy
|
002d037359
|
Avoid generation gets hanging when user specifies multiple event loops (#5162)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-10-19 16:12:49 +08:00 |
|
fzyzcjy
|
a27825ae01
|
Support not officially supported high sgl-kernel version with low srt version (#11786)
|
2025-10-19 16:11:59 +08:00 |
|
fzyzcjy
|
ce399e154c
|
Make single-batch overlap compatible with NextN (#11804)
|
2025-10-19 16:10:44 +08:00 |
|
fzyzcjy
|
ea6275dfbc
|
Tiny add hints when users send requests to wrong place (#11808)
|
2025-10-19 16:10:20 +08:00 |
|
narutolhy
|
eb7318f1c2
|
support tokenized batch request (#11091)
|
2025-10-19 07:05:02 +00:00 |
|
Lianmin Zheng
|
6058fb520c
|
Update CODEOWNERS for layer quantization path (#11818)
|
2025-10-18 21:17:17 -07:00 |
|
YAMY
|
80407b0493
|
Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads (#10788)
|
2025-10-19 11:37:43 +08:00 |
|
Liangsheng Yin
|
b288f4f440
|
Improve send_sone script (#11817)
|
2025-10-19 11:28:16 +08:00 |
|
tazjin
|
6d6ea5af0c
|
fix: do not wrap invalid grammar objects during constrained generation (#11328)
|
2025-10-19 10:54:33 +08:00 |
|
Marin
|
1dacedd2db
|
make sure logit bias is applied during eagle spec decoding verification (#11555)
|
2025-10-19 10:53:33 +08:00 |
|
ybyang
|
b5e14b2b78
|
[1/2][feature] support openai like classification api (#11618)
|
2025-10-18 19:32:48 -07:00 |
|
ybyang
|
d513ee93ef
|
[2/2] [feature] support openai like classification api in router (#11670)
|
2025-10-18 19:31:08 -07:00 |
|
Simo Lin
|
a7ae61ed77
|
[router] Add Configurable L0 and L1 Tokenizer Caching (#11688)
|
2025-10-18 18:33:53 -07:00 |
|
kyleliang-nv
|
fda0cb2a30
|
Fix Dockerfile not installing correct version of DeepEP for arm build (#11773)
|
2025-10-18 15:06:05 -07:00 |
|
Qiaolin Yu
|
ebda73dc72
|
Use cutlass fp4 gemm by default (#11813)
|
2025-10-18 14:10:15 -07:00 |
|
b8zhong
|
f4f8a1b4d8
|
ci: update lmms-eval to speed up multimodal CI (#11000)
|
2025-10-19 02:51:19 +08:00 |
|
Kindyaa
|
c44e985dc2
|
feat(example/fastapi): support --startup-timeout using Qwen3-Next-80B-A3B-Instruct as example (#11710)
Co-authored-by: chenan01 <chenan01@cheche-MacBook-Pro.local>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-19 02:50:34 +08:00 |
|
b8zhong
|
f9a7d9b3dc
|
support server arg override KV cache to bf16 to avoid slow cases (#11749)
|
2025-10-19 02:49:48 +08:00 |
|
Liangsheng Yin
|
a93f10a722
|
[overlap-spec] support page size > 1 (#11772)
|
2025-10-19 02:09:13 +08:00 |
|
Teng Ma
|
585e1223f0
|
[HiCache] feat: add more eviction policy (#11506)
|
2025-10-18 15:49:45 +00:00 |
|
fzyzcjy
|
a7043c6f0d
|
Bump torch_memory_saver to avoid installing pre-release versions (#11797)
|
2025-10-18 01:20:42 -07:00 |
|
Lianmin Zheng
|
67e34c56d7
|
Fix install instructions and pyproject.tomls (#11781)
|
2025-10-18 01:08:01 -07:00 |
|
Yuwei An
|
1d726528f7
|
Eager Compiler for Torch Compile (#11803)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
|
2025-10-18 15:18:52 +08:00 |
|
Minglei Zhu
|
f4488e9dd9
|
set default attention backend for deterministic inference (#11801)
|
2025-10-18 00:01:24 -07:00 |
|
Zilin Zhu
|
e68a2b5b2f
|
[RL] use cpu group to prepare_mlp_sync_batch_raw when the server is offloaded (#10152)
|
2025-10-18 14:29:35 +08:00 |
|
Zilin Zhu
|
31b9f19e54
|
[RL] support weight update with DP attention (#11669)
|
2025-10-18 14:26:19 +08:00 |
|
Qiaolin Yu
|
547003bdd0
|
fix command line usage of profiling (#11793)
|
2025-10-18 12:54:36 +08:00 |
|
Jimmy
|
f7ab955455
|
fix(glm45): disable reduce scatter (#11665)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-18 12:19:20 +08:00 |
|