Commit Graph

  • 43f80884c5 Fix accept rate in speculative decoding metrics (#11572) Qiaolin Yu 2025-10-13 16:35:50 -07:00
  • 60b0503227 chore: bump sgl-kernel version to 0.3.16.post1 (#11573) sglang-bot 2025-10-13 16:26:18 -07:00
  • dc48c4c0e3 [sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534) Qi Yuhang 2025-10-14 07:24:48 +08:00
  • 6dc9ca8c85 [router] Add BRANCH_TYPE=local support to Dockerfile.router for local builds (#11571) Arthur Cheng 2025-10-13 16:10:51 -07:00
  • 887c2b4575 [router][grpc] Add serve_grpc to launch_server and log id for HealthCheck (#11564) Chang Su 2025-10-13 16:07:19 -07:00
  • 065ce81574 Tiny cleanup fp4 gemm calls (#11537) fzyzcjy 2025-10-14 05:48:22 +08:00
  • 8e51049f56 [CI Monitor] Ci monitor only deal with main branch in default (#11538) Xiaoyu Zhang 2025-10-14 04:50:04 +08:00
  • cb8f3d90d3 [NVIDIA] update pyproject.toml to support cu130 option (#11521) Johnny 2025-10-13 22:03:31 +02:00
  • 4b694e7d5a [router][grpc] Add error handling to generate_tool_constraints (#11562) Chang Su 2025-10-13 12:26:09 -07:00
  • 9f1f699a7a [CI] Add Basic Test for DeepSeek V3.2 (#11308) Baizhou Zhang 2025-10-13 11:41:02 -07:00
  • c9cff2b984 Fix DeepSeek-v3.2 default config (ValueError: not enough values to unpack (expected 4, got 3)) (#11557) Trevor Morris 2025-10-13 11:27:40 -07:00
  • b6fb5d7666 Add metrics for speculative decoding (acceptance rate, average acceptance length) (#11441) Scott Lee 2025-10-13 11:24:27 -07:00
  • f4aa78801e [router] Add Rust CLI flags for queue size, timeout, and rate limit for token bucket rate limiter (#11483) Jonah Bernard 2025-10-13 14:08:48 -04:00
  • 5e3f7e7fa9 Minor: improve sampler & remove unused fields from model_config.py (#11531) Lianmin Zheng 2025-10-13 11:04:44 -07:00
  • 728af88781 [router] allow user to specify chat template path (#11549) Simo Lin 2025-10-13 13:47:57 -04:00
  • 7b59b0b8b0 [router][grpc] Further delegate non-stream processing to processing.rs (#11553) Chang Su 2025-10-13 10:36:27 -07:00
  • acc2327bbd Move deep gemm related arguments to sglang.srt.environ (#11547) Liangsheng Yin 2025-10-14 00:34:35 +08:00
  • bfadb5ea5f Adjust overlap event loop (#11507) Liangsheng Yin 2025-10-14 00:33:19 +08:00
  • 9cc1e065f1 [router][Fix] Include grpc reflection runtime dependency (#11419) ai-jz 2025-10-13 09:32:42 -07:00
  • b8c430f1ce [NVIDIA] BUMP FA3 (#11444) Johnny 2025-10-13 18:30:57 +02:00
  • f35f120d70 fix: fix video input for qwen3-vl (#11442) Mick 2025-10-14 00:30:43 +08:00
  • 54a46a264d Remove tp_worker.worker (#11548) Liangsheng Yin 2025-10-13 22:38:48 +08:00
  • 7c94eaeeb0 [router] allow tokenizer path to be dir (#11530) Simo Lin 2025-10-13 09:30:09 -04:00
  • 13d596c93e [router][ci] Add Nightly Release Workflow for SGLang Router (#11527) Simo Lin 2025-10-13 09:28:55 -04:00
  • c7867b6702 [Fix] Add per_channel_quant parameter to MoE config functions (#11201) Mohammad Miadh Angkad 2025-10-13 21:26:06 +08:00
  • 516738b096 Depreate global_server_args_dict (#11528) Liangsheng Yin 2025-10-13 19:34:43 +08:00
  • 0b6f535f66 [Reland] perf: optimize qwen-vl with symm mem allreduce (#11457) Yuan Luo 2025-10-13 17:51:25 +08:00
  • c5fe3c0b75 Tiny fix test run estimated time (#11544) Shangming Cai 2025-10-13 17:23:13 +08:00
  • 318424e2c8 [HICache]: Support 3FS-Store with page_first_direct layout (#11460) hzh0425 2025-10-13 15:47:22 +08:00
  • 6806c4e63e [CI monitor] Improve CI analyzer: fix job failure tracking and add CUDA-focused filtering (#11505) Xiaoyu Zhang 2025-10-13 13:31:09 +08:00
  • 0c0779d667 ci: improve nightly-ci (#11385) Mick 2025-10-13 12:19:34 +08:00
  • a55cf5304a [Feature] Support mamba radix cache v0 (#11214) Yi Zhang 2025-10-13 11:57:15 +08:00
  • 19ba16aa3d [Fix]: add missing device attribute to ChunkCache (#11493) Yuanhang Sun 2025-10-13 11:49:59 +08:00
  • a2b3d9b90b Update DeepSeek-R1-FP4 default config on blackwell (#11512) Qiaolin Yu 2025-10-12 20:32:11 -07:00
  • 9a30914e94 [sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432) Qi Yuhang 2025-10-13 11:19:21 +08:00
  • 8e776c78a1 docs(router): add token-bucket rate limiting to the docs (#11485) Jonah Bernard 2025-10-12 23:03:27 -04:00
  • 63e84352b7 [router] openai router: support grok model (#11511) Keyang Ru 2025-10-12 19:44:43 -07:00
  • a20e7df8d0 Improve dp attention port assignment scheme (#5889) Yongtong Wu 2025-10-13 08:55:59 +08:00
  • 1bdd010291 Revert "Deprecate global_server_args_dict" (#11520) Cheng Wan 2025-10-12 17:40:40 -07:00
  • 6cd296940a [lint] Fix the lint issue (#11516) Cheng Wan 2025-10-12 16:22:46 -07:00
  • 2ac46e94ef Sync changes on io_struct.py and deterministic ops (#11498) Lianmin Zheng 2025-10-12 16:03:10 -07:00
  • 0aa65f94f1 [Fix] Improve longbench prompt and other logics (#11474) Binyao Jiang 2025-10-12 15:04:28 -07:00
  • 0ecb42613d fix: revert temporarily remove b200 tests (#11515) Yineng Zhang 2025-10-12 15:02:37 -07:00
  • 05f015f65f chore: remove flashinfer cleanup cache (#11514) Yineng Zhang 2025-10-12 14:56:33 -07:00
  • 1083e7e3df Deprecate global_server_args_dict (#11331) Liangsheng Yin 2025-10-13 01:20:47 +08:00
  • 2157d12ae8 [CI] fix lint (#11509) Liangsheng Yin 2025-10-13 01:07:21 +08:00
  • 9f2b457cbe doc: add doc for adding new models into nightly-ci (#11443) Mick 2025-10-12 23:35:10 +08:00
  • f5b34a510c Bugfix: Fix Type consistency for KV indices in SWARadixCache (#11452) hzh0425 2025-10-12 23:19:44 +08:00
  • 5a6ec8f999 Fix unit tests (#11503) Lianmin Zheng 2025-10-12 07:45:57 -07:00
  • 6a653bb11b temporarily remove b200 tests (#11502) Lianmin Zheng 2025-10-12 06:48:49 -07:00
  • 548a57b1f3 Fix port conflicts in CI (#11497) Lianmin Zheng 2025-10-12 06:46:36 -07:00
  • 88e73ed048 Temporarily remove b200 tests (#11501) Lianmin Zheng 2025-10-12 06:41:37 -07:00
  • 4b15fa00f0 move fla env check position (#11500) Yi Zhang 2025-10-12 21:40:45 +08:00
  • f49419061d Move args from global_config to environ (#11332) Liangsheng Yin 2025-10-12 21:29:31 +08:00
  • 01e59e8247 Fix CI break by express-laned PRs. (#11499) Liangsheng Yin 2025-10-12 21:06:06 +08:00
  • 99a0704a36 bailingMoE: Fix Key error of deepep_mode (#11465) Mike Qiu 2025-10-12 20:42:59 +08:00
  • ec1cd90ac9 Fix the GPT function calling regex to allow dash in the name (#10577) Antoine Roux 2025-10-12 14:34:58 +02:00
  • 1103dc6204 [chore][2/N] Avoid using default mutable parameters (#11479) Kai-Hsun Chen 2025-10-12 05:34:04 -07:00
  • a220536f40 [ perf ] Replace json-> orjson in hot path (#11221) Vincent Zhong 2025-10-12 08:30:58 -04:00
  • 7b064f04f8 [bugfix]: use correct causality condition for flashattention, flashinfer, and triton backends (#10172) Mahmoud Ashraf 2025-10-12 15:28:16 +03:00
  • 43190becfa [chore][1/N] Avoid using default mutable parameters (#11478) Kai-Hsun Chen 2025-10-12 05:26:39 -07:00
  • be740acdb0 [smol] [perf] Qwen3-VL in place op. (#11481) Vincent Zhong 2025-10-12 08:25:30 -04:00
  • 2db2cddd12 chore: bump sgl-kernel version to 0.3.16 (#11476) sglang-bot 2025-10-11 22:04:49 -07:00
  • 9b5efe3464 [Router]: Small Typo in a comment within tree.rs (#11489) Wenyi Xu 2025-10-12 12:59:48 +08:00
  • 4ac8e09df0 Piecewise CUDA Graph Support & Torch Compile Backend (#10062) Yuwei An 2025-10-11 20:55:57 -07:00
  • 20a6c0a63d Beta spec-overlap for EAGLE (#11398) Liangsheng Yin 2025-10-12 11:02:22 +08:00
  • 47c606d3dc [Feature] support regex strings as a stopping condition (#10635) Glen Liu 2025-10-11 22:53:15 -04:00
  • 9fcf73069f [CI] Add nightly builds to dockerhub (#9804) Sahithi Chigurupati 2025-10-11 18:27:46 -07:00
  • 0a304870e8 fix Xeon CI (#11454) Zaili Wang 2025-10-12 05:08:28 +08:00
  • 8fdcd98efe [7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019) PGFLMG 2025-10-12 05:04:57 +08:00
  • b5dcfd4154 Add option to disable any_whitespace for xgrammar and llguidance backends. (#8919) Lorenzo Lu 2025-10-11 16:24:58 +02:00
  • 5061b8fd3e fix stop when stream (#11462) ybyang 2025-10-11 22:06:31 +08:00
  • c8452551ce [Fix] Fix split prefill with fa3. (#11428) ykcombat 2025-10-11 22:03:28 +08:00
  • bf3e7149be Fix enable_v2 in int8 quant (#11470) fzyzcjy 2025-10-11 21:56:30 +08:00
  • f5754d1256 [Documentation][Configuration] Server args and documentation of PD-Multiplexing. (#11427) ykcombat 2025-10-11 21:36:07 +08:00
  • 739daa63e4 Adjust logits metada init for target verify (#11467) Liangsheng Yin 2025-10-11 21:17:04 +08:00
  • d957177a22 Super tiny delete unused openai router in sgl-router (#11448) fzyzcjy 2025-10-11 15:59:30 +08:00
  • 21337b22b9 Reland [1/2] Optimizations and refactors about quant kernel (#10312) fzyzcjy 2025-10-11 15:59:03 +08:00
  • 129d299278 Enable native ModelOpt quantization support (2/3) (#9991) Zhiyu 2025-10-11 00:48:14 -07:00
  • 8b85926a6e Remove tilelang dependency in Dockerfile (#11455) Baizhou Zhang 2025-10-10 23:17:53 -07:00
  • 451d15c44b [DPSKv3.2] Rewrite nsa tilelang act_quant kernel to triton (#11450) Binyao Jiang 2025-10-10 23:13:46 -07:00
  • c80a96dae9 [BugFix] test_mla_fp8.py fails on Cublas 12.9 (#11360) Liu-congo 2025-10-11 12:14:24 +08:00
  • eae9a9fb9d Fix batch invariant ops (#11368) Stefan He 2025-10-10 20:49:08 -07:00
  • 2674c1d280 fix: Change dsv32 hack temporary path to use system temp directory (#11445) wxsm 2025-10-11 10:59:41 +08:00
  • 61055cb309 Reorder PD disagg CI tests (#11438) Lianmin Zheng 2025-10-10 17:56:49 -07:00
  • 92777135a0 [router][grpc] Consolidate parser checks for chat completions (#11439) Chang Su 2025-10-10 17:44:29 -07:00
  • c495833186 [router] leverage RAII to actively cancel request during client disconnect (#11399) Simo Lin 2025-10-10 20:43:38 -04:00
  • 2eeb27515a [router] disable rate limiter by default (#11435) Simo Lin 2025-10-10 20:43:07 -04:00
  • b36afed4a7 Separate allocation logic from scheduler (#11313) cctry 2025-10-10 17:38:54 -07:00
  • 9aa4502d11 feat(mooncake): support GB suffix for global_segment_size (#10745) JinYan Su 2025-10-11 08:38:25 +08:00
  • a0835c3a62 [router] Fix ci nvcc not found error (#11411) Keyang Ru 2025-10-10 15:43:16 -07:00
  • 55b14656e6 Revert "Add metrics for speculative decoding (acceptance rate, average acceptance length)" (#11433) Scott Lee 2025-10-10 12:54:57 -07:00
  • b4408e6098 Revert "fix: fix video input for qwen3-vl" (#11437) Lianmin Zheng 2025-10-10 12:44:40 -07:00
  • 52fcbbb8bd Revert "perf: optimize qwen-vl with symm mem allreduce" (#11436) Cheng Wan 2025-10-10 12:30:05 -07:00
  • af96ca1136 [CI] Merge build-dev into workflow matrix (#11345) Sahithi Chigurupati 2025-10-10 11:13:42 -07:00
  • 9082a7d323 [HiCache] feat: add multi tenant with prefix tag (#9256) Teng Ma 2025-10-11 00:23:28 +08:00
  • 3b9d97f335 perf: optimize qwen-vl with symm mem allreduce (#11381) Yuan Luo 2025-10-10 22:24:45 +08:00
  • a1a20b4c7c fix: fix video input for qwen3-vl (#11361) Mick 2025-10-10 19:35:35 +08:00
  • 4299aebdbb chore: update pyproject (#11420) Yineng Zhang 2025-10-10 00:56:30 -07:00
  • 0babd48736 Add metrics for speculative decoding (acceptance rate, average acceptance length) (#11144) Scott Lee 2025-10-10 00:46:44 -07:00