Commit Graph

  • c1e1600373 [fix] fix ci uv install dependency (#11895) Hank Han 2025-10-21 16:23:34 +08:00
  • 852c0578fd [FEATURE] Add OpenAI-Compatible LoRA Adapter Selection (#11570) Neelabh Sinha 2025-10-21 00:44:33 -07:00
  • 7e6191c098 init support for KTransformers Heterogeneous Computing (#11487) Atream 2025-10-21 15:17:02 +08:00
  • 6f9b66bdda [AMD] Update wave-lang to 3.8.0 (#11878) Gaurav Verma 2025-10-20 23:11:09 -07:00
  • 68277eac30 adaptation part w4A8 quantization lizhigong 2025-10-21 14:06:43 +08:00
  • 8a801ee38d [router] release router 0.2.1 (#11885) Simo Lin 2025-10-20 21:08:45 -07:00
  • d9a20fd28a Use trtllm_mla decode kernel for draft extend in speculative decoding (#11664) Qiaolin Yu 2025-10-20 20:42:09 -07:00
  • b113c72e7a Init attention backend for Intel XPU (#10656) Meng, Hengyu 2025-10-21 11:41:28 +08:00
  • fb6cc7b000 Fix RotaryEmbedding for fp32 input (#11843) zhangdonghao-zdh 2025-10-21 10:56:48 +08:00
  • 8374a96e49 piecewise cuda graph support qwen3-moe (#11845) Xiaoyu Zhang 2025-10-21 10:55:49 +08:00
  • 74de76c685 Revise MRotaryEmbedding's forward (#11859) Yuan Luo 2025-10-21 10:38:29 +08:00
  • 9c0b1eb5ad [router][grpc] Fix wram-up random token ids for small models (#11887) Chang Su 2025-10-20 19:22:17 -07:00
  • 01f14a7ad2 [code move] move pp into a separate mixin (#11838) Lianmin Zheng 2025-10-20 18:46:56 -07:00
  • 1111030395 [router] clean up workflow logs to debug for implementation details logs (#11886) Simo Lin 2025-10-20 18:24:55 -07:00
  • 28ddfb37d7 fix(sql-router): fix conflict port in test (#11826) Tien Nguyen 2025-10-21 08:06:34 +07:00
  • e69094df64 [router][grpc] Remove continue_final_message in ChatTemplateParams and add minijinja-contrib (#11882) Chang Su 2025-10-20 18:03:09 -07:00
  • 43ad05907c [Auto Sync] Update scheduler.py, server_args.py (20251020) (#11875) Lianmin Zheng 2025-10-20 17:41:19 -07:00
  • b4948512b8 [router] remove encoding header for oai router (#11881) Simo Lin 2025-10-20 17:39:00 -07:00
  • ddcba74b4d [router] Worker Management Workflow Engine (#11868) Simo Lin 2025-10-20 17:00:22 -07:00
  • 0917c5da8c Support mixing cutedsl and deepgemm backend (#11807) fzyzcjy 2025-10-21 07:38:35 +08:00
  • 184a4df697 Replace function call with set literal (#11867) penguin_wwy 2025-10-21 01:39:16 +08:00
  • f7b1d8c5ab Fix acc len and gen throughput metrics when enabling overlap-spec (#11823) Qiaolin Yu 2025-10-20 10:34:38 -07:00
  • bfc3b3f786 [9/N] MoE Refactor: cleanup dispatcher interfaces (#11847) Cheng Wan 2025-10-20 10:11:46 -07:00
  • da5bde4d16 Tiny fix main lint (#11862) Liangsheng Yin 2025-10-20 19:57:24 +08:00
  • 276e7b3e4e [Feature] New structural tag support (#10691) DarkSharpness 2025-10-20 18:25:58 +08:00
  • 296f689242 fix(server_args): handle tokenizer init conflicts (#11776) ishandhanani 2025-10-20 00:27:19 -07:00
  • 9edb7b5123 [AMD CI] Populate image cache in nightly docker release. (#11822) Sai Enduri 2025-10-20 00:04:04 -07:00
  • e53bf44243 Update amd gpu install docs. (#11849) Sai Enduri 2025-10-20 00:03:26 -07:00
  • d383e6616e [Model] Add Olmo 3 model support (#11396) Shane A 2025-10-19 23:59:16 -07:00
  • 984fbeb16b Revert "[CI Monitor] Ci monitor only deal with main branch in default" (#11846) Xiaoyu Zhang 2025-10-20 13:06:40 +08:00
  • a2ba0bc3df Tiny clean up for PD module and doc (#11747) Shangming Cai 2025-10-20 11:52:42 +08:00
  • 6d2d0ce285 [PD] Improve eagle acceptance rate by transferring draft model hidden states (#10801) Ziming Huang 2025-10-20 11:52:18 +08:00
  • 271d3d0d50 Support mrope triton kernel and add unit test (#11722) Yuan Luo 2025-10-20 11:51:07 +08:00
  • c4e81e64fb [Feature] Use current greenctx stream to communicate in PD-Multiplexing. (#11594) ykcombat 2025-10-20 10:58:20 +08:00
  • c726d44cc7 Recapture cuda graph after model weight update to resolve IMA error (#11780) harrisonlimh 2025-10-19 19:50:03 -07:00
  • 283c8ba031 chore: bump sgl-kernel version to 0.3.16.post3 (#11733) sglang-bot 2025-10-19 19:44:15 -07:00
  • cae3956585 check master server for mooncake store (#10510) huangtingwei 2025-10-20 09:37:09 +08:00
  • 27a223aba4 Improve Kernel Build Time (#11508) Kangyan-Zhou 2025-10-19 18:11:48 -07:00
  • 53529f46cc Fix version bump script to handle TOML files with outdated versions (#11787) Kangyan-Zhou 2025-10-19 18:10:26 -07:00
  • 24ed3f32c0 fix(ci): Fix CI Monitor limit parameter and add CI Analysis to summary (#11832) Xiaoyu Zhang 2025-10-20 09:08:34 +08:00
  • 44f0ece9fc [Doc] Update documents for FA4 (#11778) Baizhou Zhang 2025-10-19 19:40:38 -05:00
  • be0058bc05 [BugFix] replace the input_to_float8 used in dsv2 (#11612) Liu-congo 2025-10-20 08:34:13 +08:00
  • 9e3be1fa2a Tiny bump DeepEP version in ARM blackwell (#11810) fzyzcjy 2025-10-20 08:15:14 +08:00
  • a8ba32798e Fix triton_kernels import error on some hardwares (#11831) fzyzcjy 2025-10-20 08:14:47 +08:00
  • 3b80232d06 [DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815) hlu1 2025-10-19 17:13:39 -07:00
  • 252dc4e112 [NVIDIA] FA3/FA4 Fix (#11606) Johnny 2025-10-20 02:10:10 +02:00
  • cbb5fc2edc [CI] Add CI test for DeepSeek V3.2 MTP (#11835) Baizhou Zhang 2025-10-19 19:00:25 -05:00
  • 53fb229f53 [logprobs] Enable local deterministic logrprobs testing with strict threshold (#10994) Night 2025-10-19 13:30:39 -07:00
  • 4fff1ec1d9 Deterministic Mode: Add 1-stage triton kernel for prefill (#11147) Stefan He 2025-10-19 10:47:36 -07:00
  • 7a020e0f3b [Test] Add basic matched stop for beta eagle (#11833) Liangsheng Yin 2025-10-20 01:17:00 +08:00
  • 48738af7f9 [CI] always print back trace in retry() (#11834) Liangsheng Yin 2025-10-20 01:12:49 +08:00
  • efa473348b [Spec Decoding] Support MTP for dsv3.2 (#11652) Paiiii 2025-10-19 23:44:22 +08:00
  • d658f0497e [overlap-spec] fix stop condition and trimming (#11819) Liangsheng Yin 2025-10-19 22:00:20 +08:00
  • 57e25de756 Revert "Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads" (#11827) Liangsheng Yin 2025-10-19 19:44:06 +08:00
  • 12eb02e982 Change bf16 to fp8 for some gemms in attention for DeepSeek ckpt v2 (#11805) fzyzcjy 2025-10-19 16:15:13 +08:00
  • 002d037359 Avoid generation gets hanging when user specifies multiple event loops (#5162) fzyzcjy 2025-10-19 16:12:49 +08:00
  • a27825ae01 Support not officially supported high sgl-kernel version with low srt version (#11786) fzyzcjy 2025-10-19 16:11:59 +08:00
  • ce399e154c Make single-batch overlap compatible with NextN (#11804) fzyzcjy 2025-10-19 16:10:44 +08:00
  • ea6275dfbc Tiny add hints when users send requests to wrong place (#11808) fzyzcjy 2025-10-19 16:10:20 +08:00
  • eb7318f1c2 support tokenized batch request (#11091) narutolhy 2025-10-19 00:05:02 -07:00
  • 6058fb520c Update CODEOWNERS for layer quantization path (#11818) Lianmin Zheng 2025-10-18 21:17:17 -07:00
  • 80407b0493 Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads (#10788) YAMY 2025-10-18 20:37:43 -07:00
  • b288f4f440 Improve send_sone script (#11817) Liangsheng Yin 2025-10-19 11:28:16 +08:00
  • 6d6ea5af0c fix: do not wrap invalid grammar objects during constrained generation (#11328) tazjin 2025-10-19 05:54:33 +03:00
  • 1dacedd2db make sure logit bias is applied during eagle spec decoding verification (#11555) Marin 2025-10-19 04:53:33 +02:00
  • b5e14b2b78 [1/2][feature] support openai like classification api (#11618) ybyang 2025-10-19 10:32:48 +08:00
  • d513ee93ef [2/2] [feature] support openai like classification api in router (#11670) ybyang 2025-10-19 10:31:08 +08:00
  • a7ae61ed77 [router] Add Configurable L0 and L1 Tokenizer Caching (#11688) Simo Lin 2025-10-18 18:33:53 -07:00
  • fda0cb2a30 Fix Dockerfile not installing correct version of DeepEP for arm build (#11773) kyleliang-nv 2025-10-18 15:06:05 -07:00
  • ebda73dc72 Use cutlass fp4 gemm by default (#11813) Qiaolin Yu 2025-10-18 14:10:15 -07:00
  • f4f8a1b4d8 ci: update lmms-eval to speed up multimodal CI (#11000) b8zhong 2025-10-18 11:51:19 -07:00
  • c44e985dc2 feat(example/fastapi): support --startup-timeout using Qwen3-Next-80B-A3B-Instruct as example (#11710) Kindyaa 2025-10-19 02:50:34 +08:00
  • f9a7d9b3dc support server arg override KV cache to bf16 to avoid slow cases (#11749) b8zhong 2025-10-18 11:49:48 -07:00
  • a93f10a722 [overlap-spec] support page size > 1 (#11772) Liangsheng Yin 2025-10-19 02:09:13 +08:00
  • 585e1223f0 [HiCache] feat: add more eviction policy (#11506) Teng Ma 2025-10-18 23:49:45 +08:00
  • a7043c6f0d Bump torch_memory_saver to avoid installing pre-release versions (#11797) fzyzcjy 2025-10-18 16:20:42 +08:00
  • 67e34c56d7 Fix install instructions and pyproject.tomls (#11781) Lianmin Zheng 2025-10-18 01:08:01 -07:00
  • 1d726528f7 Eager Compiler for Torch Compile (#11803) Yuwei An 2025-10-18 00:18:52 -07:00
  • f4488e9dd9 set default attention backend for deterministic inference (#11801) Minglei Zhu 2025-10-18 00:01:24 -07:00
  • e68a2b5b2f [RL] use cpu group to prepare_mlp_sync_batch_raw when the server is offloaded (#10152) Zilin Zhu 2025-10-18 14:29:35 +08:00
  • 31b9f19e54 [RL] support weight update with DP attention (#11669) Zilin Zhu 2025-10-18 14:26:19 +08:00
  • 547003bdd0 fix command line usage of profiling (#11793) Qiaolin Yu 2025-10-17 21:54:36 -07:00
  • f7ab955455 fix(glm45): disable reduce scatter (#11665) Jimmy 2025-10-18 12:19:20 +08:00
  • dbbd4e1891 Try add back no-commit-to-branch (#11799) fzyzcjy 2025-10-18 12:05:12 +08:00
  • ca240eefb4 [router][grpc] Support parallel queue puts in grpc_request_manager and remove mutex for grpc_client (#11798) Chang Su 2025-10-17 20:49:43 -07:00
  • 6c7c92eb02 Enable lint on main (#11794) fzyzcjy 2025-10-18 10:08:50 +08:00
  • 5b214b50b6 [Refactor] move deep_gemm_wrapper out of quantization (#11784) Cheng Wan 2025-10-17 18:57:54 -07:00
  • 13219e1e48 completely remove mixed mode deterministic test as prefix mode could cover it (#11783) Minglei Zhu 2025-10-17 17:46:03 -07:00
  • 33e9bbec35 Make single-batch overlap compatible with offloading (#11614) fzyzcjy 2025-10-18 08:45:54 +08:00
  • dcb8f090ad Super tiny fix CI (#11788) fzyzcjy 2025-10-18 08:41:58 +08:00
  • 9eefe2c0b7 Set CUDA_VISIBLE_DEVICES to achieve one GPU per process (#9170) Lianmin Zheng 2025-10-17 17:30:06 -07:00
  • 69fe3c9726 Manually flip deepep_mode for cuda_graph (#11666) Zilin Zhu 2025-10-18 08:05:48 +08:00
  • 8af8491298 Support casting bf16 NextN moe to fp8 (#11613) fzyzcjy 2025-10-18 08:02:15 +08:00
  • 505329cab0 Support shared experts overlap in cutlass moe (#11611) fzyzcjy 2025-10-18 07:59:40 +08:00
  • 8a382fd399 Super tiny fix missing input throughput (#11607) fzyzcjy 2025-10-18 07:58:48 +08:00
  • 627974405d [Lint] Add python/sglang to ruff F401 checks and remove unused imports in files (#11685) Chang Su 2025-10-17 16:49:46 -07:00
  • 2614adf9ca [Fix] Skip visual layers when applying LoRA to Qwen2VL modules (#11519) Antonin Vidon 2025-10-17 18:39:57 -04:00
  • fdd7c69d65 [Auto Sync] Update common.py (20251017) (#11782) Lianmin Zheng 2025-10-17 15:03:42 -07:00
  • b9a54e0968 [minor] sync code on python/sglang/test/test_deterministic.py and improve ci tests (#11777) Lianmin Zheng 2025-10-17 14:25:22 -07:00
  • 20b8d2306c Cleaning indexer for DeepSeek V3.2 (#11682) Baizhou Zhang 2025-10-17 15:47:21 -05:00