Commit Graph

  • 97d966a7f8 ci: make find_local_hf_snapshot_dir more robust (#11248) Mick 2025-10-06 10:50:11 +08:00
  • 8e66d87f0a Fix spec_utils.py (#11247) sglang-bot 2025-10-05 19:01:11 -07:00
  • a20fc7b7dc Create two new GH workflows to automatically bump SGLang and Kernel version (#10996) Kangyan-Zhou 2025-10-05 18:14:05 -07:00
  • 6b30e097ab [Auto Sync] Update io_struct.py (20251004) (#11206) Lianmin Zheng 2025-10-05 18:06:07 -07:00
  • d645ae90a3 Rename runner labels (#11228) Lianmin Zheng 2025-10-05 18:05:41 -07:00
  • 41763ba079 Remove gdrcopy check in ci_install_deepep.sh (#11237) Cheng Wan 2025-10-05 17:35:22 -07:00
  • 652c24a653 Update transformers package version to 4.57.0 (#11222) Xinyuan Tong 2025-10-05 16:45:14 -07:00
  • 5e142484e2 [Fix AMD CI] VRAM cleanup (#11174) sunxxuns 2025-10-05 19:03:53 -04:00
  • c560410da7 Refactor and optimize mooncake CI (#11162) Shangming Cai 2025-10-06 05:08:52 +08:00
  • 590f2da052 [Feat] Support Torch Symm Mem AllReduce (#10571) Yuan Luo 2025-10-06 04:55:19 +08:00
  • 148d8d485d Update DeepGEMM repository tag to specific commit (#11229) Lianmin Zheng 2025-10-05 13:47:36 -07:00
  • 1a599509cc chore: bump sgl-kernel v0.3.14.post1 (#11137) PGFLMG 2025-10-06 04:46:43 +08:00
  • 36a6b8dbfc Update v1/responses to be more OpenAI-compatible. (#9624) Vincent Zhong 2025-10-05 14:47:46 -04:00
  • e0b2d3eebe [Feature] Add a fast-topk to sgl-kernel for DeepSeek v3.2 (#11194) DarkSharpness 2025-10-06 01:19:03 +08:00
  • 4cb5a5235e Tiny skip_sample adjust (#11225) Liangsheng Yin 2025-10-05 23:41:04 +08:00
  • 85c1f79377 Add DeepSeek-V3.2 Tool Call Template (#11063) Xu Wenqing 2025-10-05 09:53:49 +08:00
  • 48e9e71930 Add --max-new-tokens CLI flag for MMMU evaluation (#11217) yhyang201 2025-10-05 08:35:53 +08:00
  • 31b49c0b51 EAGLE cache fix for HiCache (#11215) Ke Bao 2025-10-05 07:53:53 +08:00
  • d736e0b65e [router] add grpc router pd mode for chat and generate (#11140) Simo Lin 2025-10-04 09:58:28 -04:00
  • ffd03a9bd3 [router] fix get load response parsing (#11213) Simo Lin 2025-10-04 09:58:02 -04:00
  • 666da3d59f [fix]enable flashmla when using draft model P/D attention select (#11012) Hank Han 2025-10-04 20:59:34 +08:00
  • d01b921482 fix sampling_seed handling when deterministic is enabled (#11096) Alex Chi Z 2025-10-03 23:41:46 -04:00
  • c70e58e837 [HICache]: Refactor HiCache CI (#11011) hzh0425 2025-10-04 08:51:56 +08:00
  • c61b9a1d01 fix self.enable_kv_cache_events (#11178) narutolhy 2025-10-03 14:09:41 -07:00
  • 3c3d6255d9 [fix]missing prefix_lens_cpu init when p/d disaggregation (#11196) Hank Han 2025-10-04 04:39:59 +08:00
  • 546914fa2d [Fix] Fix the bug of the calculation of base_gpu_id (dp offset) in data_parallel_controller.py (#10741) XSongQ 2025-10-03 13:25:57 -07:00
  • 4726c9197f [minor] fix the lint (#11198) Liangsheng Yin 2025-10-04 01:04:58 +08:00
  • a0010bf4e8 fix qwen2 eagle3 runtime error (#10517) jiapingW 2025-10-04 00:19:52 +08:00
  • 307fc060e8 fix xeon ci check (#10838) DiweiSun 2025-10-04 00:17:36 +08:00
  • 586e81a28a [Test] Initialize mem_fraction_static in setUpClass to fix pytest VLM test crashes. (#10859) vikram singh shekhawat 2025-10-03 21:44:48 +05:30
  • fad7ca73f8 model: support starcoder2 (#10609) Praneth Paruchuri 2025-10-03 21:41:19 +05:30
  • 08af8ffb5c fix 3fs indices (#10855) pansicheng 2025-10-04 00:06:38 +08:00
  • 2c7f4ca2f2 Optimize debug log position of PD abort request (#11090) Shangming Cai 2025-10-03 23:07:02 +08:00
  • 03def5e3b1 Fix [test]: Env:SGLANG_TORCH_PROFILER_DIR for pytest. (#10780) shubham singhal 2025-10-03 20:29:32 +05:30
  • 6ae3f05b33 Fix CUDA illegal memory access issues in speculative decoding (#10892) ur4t 2025-10-03 22:44:07 +08:00
  • fdc4e1e570 Tiny move files to utils folder (#11166) fzyzcjy 2025-10-03 22:40:06 +08:00
  • 7993ed8ddd 适配deepseekv3.2 0.5.3rc0 maxiao1 2025-10-03 20:01:17 +08:00
  • 04b86b3c5c [hot-fix] Fix CI break which caused by adding thinking_mode in eval (#11192) Liangsheng Yin 2025-10-03 18:29:27 +08:00
  • d6777a706d Add --thinking-mode to run_eval (#11189) hlu1 2025-10-03 01:49:39 -07:00
  • 8c57490210 [Feature] Option to save model weights to CPU when memory saver mode is enabled (#10873) Matt Nappo 2025-10-03 04:48:19 -04:00
  • 34151f173b [router] Steaming support for MCP Tool Calls in OpenAI Router (#11173) Keyang Ru 2025-10-03 00:19:43 -07:00
  • 6794d21051 Tiny add PD disaggregation + DP attention test (#11167) fzyzcjy 2025-10-03 14:15:46 +08:00
  • 1a31229cd4 fix: radix cache memory accounting (#10637) Alex Chi Z 2025-10-03 01:47:33 -04:00
  • de89ef49da [CI]] Tee server logs to both file and stdout/stderr using PIPE (#11185) Liangsheng Yin 2025-10-03 12:31:13 +08:00
  • b00a0c786f [Fix] Update to v0.1.5.post4 and refine HIP attention backend selection (#11161) jacky.cheng 2025-10-03 12:19:30 +08:00
  • a2faf8940c [1/n] Enable DCA CUDA graph capture (#9537) b8zhong 2025-10-02 20:30:00 -07:00
  • 7e61737d3f [Generative Scores API] add performance tests to CICD (#10830) Vedant V Jhaveri 2025-10-02 19:57:55 -07:00
  • 3c699772c9 Introduce naming convention in io_struct and base sglang io classes. (#10133) Liangsheng Yin 2025-10-03 10:55:13 +08:00
  • e810077488 Allow use of TRTLLM_MHA backend for hybrid attention on Blackwell (#11138) Dom Brown 2025-10-03 00:04:58 +01:00
  • 963175d5c0 [router][grpc] Support streaming for v1/chat/completions (#11179) Chang Su 2025-10-02 14:35:16 -07:00
  • 0618ad6dd5 fix: shoudn't include CUDA_ARCH 100 and 120 for cuda12.6.1 (#11176) gongwei-130 2025-10-02 13:24:23 -07:00
  • 6a261aaca5 Minor fixes for server_args, parallel_state, and test_deterministic.py (#11159) Lianmin Zheng 2025-10-02 12:12:49 -07:00
  • 7ff740a6ce Remove dp balance metadata and minimul token balance. (#11170) Liangsheng Yin 2025-10-03 01:48:15 +08:00
  • bfcd9b2433 [grpc] style fix for grpc compilation. (#11175) Liangsheng Yin 2025-10-03 01:44:29 +08:00
  • 458611de77 Unify forward output datastructure (#11124) Liangsheng Yin 2025-10-03 00:28:57 +08:00
  • 3511b37099 [proto] Add script to compile python protos (#11171) Chang Su 2025-10-02 08:45:51 -07:00
  • afcd3e1089 Tiny remove duplicated code (#11164) fzyzcjy 2025-10-02 21:56:31 +08:00
  • 12d6818380 Tiny fix ep_gather behavior different in CI (#11130) fzyzcjy 2025-10-02 21:55:53 +08:00
  • b65db0287b Tiny cleanup deepseek_v2.py (#11163) fzyzcjy 2025-10-02 21:54:52 +08:00
  • 948278f173 fix cpp JIT compilation issue of ngram speculative decoding (#10837) b8zhong 2025-10-02 06:05:01 -07:00
  • 7d00479950 Clean up ascend allocator (#11152) Liangsheng Yin 2025-10-02 20:34:26 +08:00
  • 083629c235 [model] Add mamba2 and Falcon-H1 support. (#10988) ilyasch2 2025-10-02 15:15:36 +04:00
  • b658be6f6a [router][grpc] Support tool call parser in streaming (#11160) Chang Su 2025-10-02 03:18:50 -07:00
  • 5e786cca3a Support single batch overlap (#10422) fzyzcjy 2025-10-02 18:04:36 +08:00
  • 0b9dfba787 Support dispatch low latency (#10263) fzyzcjy 2025-10-02 18:02:19 +08:00
  • 6a29003410 Remove unused pack .item() in paged allocator. (#11156) Liangsheng Yin 2025-10-02 18:01:21 +08:00
  • 2ac453b07f Tiny detect slow ranks (#10508) fzyzcjy 2025-10-02 18:00:33 +08:00
  • f35def8652 Fuse quantize and rope in trtllm_mla MTP (#10779) fzyzcjy 2025-10-02 17:59:37 +08:00
  • d61615fe93 Tiny fix missing alt stream in nextn layer (#10768) fzyzcjy 2025-10-02 17:58:23 +08:00
  • b1ccaf01cd Tiny improve dumper (#11132) fzyzcjy 2025-10-02 17:55:01 +08:00
  • 097725bb66 Clean up parallel_state.py (#11148) Lianmin Zheng 2025-10-02 01:09:13 -07:00
  • 44b1fbe258 Fix DeepSeek chunked prefill memory issue (#11149) fzyzcjy 2025-10-02 14:56:59 +08:00
  • c0dbbdd12b [ROCm] To reduce the compiling time when using torch compile. (#10559) sogalin 2025-10-02 14:53:14 +08:00
  • 25e7dbe8af Fix ngram spec with page size > 1 (#11135) Liangsheng Yin 2025-10-02 12:34:23 +08:00
  • 0b2aa8a70c Intoduce cpu tensor as metadata to avoid blocking gpu kernel launch (#10720) Zhang Junda 2025-10-02 10:51:25 +08:00
  • 609f65ba23 Remove debug print statement from scheduler output (#11145) Lianmin Zheng 2025-10-01 13:37:05 -07:00
  • 2d62af6be5 Fix metrics and request tracing (TimeStats) (#11123) Lianmin Zheng 2025-10-01 13:03:07 -07:00
  • a28b394fba [router] Add multi-turn tool calling loop support for MCP integration (#11143) Keyang Ru 2025-10-01 12:50:21 -07:00
  • 96fe2d0f15 [router] add pd service in grpc router for pd (#11120) Simo Lin 2025-10-01 11:09:21 -04:00
  • bfa274380b [HiCache] Configurable and Dynamic Prefetch Timeout (#10512) ykwd 2025-10-01 21:44:10 +08:00
  • 86cb4db058 [Feature] Add EIC as sglang HiCache Storage backend (#10271) Shisong Ma 2025-10-01 21:43:34 +08:00
  • 2e130b7618 [HiCache]bug fix: fixed blank item in host_mem_release_queue (#11005) zhangzuo21 2025-10-01 21:42:37 +08:00
  • ac1f2928ae feat: add fast_decode_plan from flashinfer, flashinfer to 0.4.0rc3 (#10760) eigen 2025-10-01 05:56:13 -04:00
  • 195a59fe23 Refactor AMD CI. (#11128) Sai Enduri 2025-10-01 01:12:28 -07:00
  • 47488cc353 docker: x86 dev builds for hopper and blackwell (#11075) ishandhanani 2025-10-01 00:06:38 -07:00
  • 6130529143 Quick Fix: fix Qwen3-VL launch failure caused by MRotaryEmbedding arg (#10985) yhyang201 2025-10-01 13:17:05 +08:00
  • a9ce2bcb3c [Doc] Update multimodal language models documentation (#11111) Xinyuan Tong 2025-09-30 22:10:31 -07:00
  • 5dddb331c4 [Auto Sync] Update base_grammar_backend.py, xgrammar_backen... (20250930) (#11115) Lianmin Zheng 2025-09-30 21:50:43 -07:00
  • 01a26544a3 [AMD] Add Tilelang and Fast Hadamard Transform builds to Dockerfile.rocm (#11114) Hubert Lu 2025-09-30 20:00:37 -07:00
  • 73d4a5f879 Organize spec-related data structures (#10735) Liangsheng Yin 2025-10-01 09:45:30 +08:00
  • 7fb551a75d [router] add mcp list and mcp call in output array (#11112) Keyang Ru 2025-09-30 18:41:54 -07:00
  • 1193f13181 fix: KimiK2Detector Improve tool call ID parsing with regex (#10972) Xinyuan Tong 2025-09-30 17:44:44 -07:00
  • 84a9f5d660 Feature/make PEFT adapter module format compatibile (#11080) Chenxi Li 2025-09-30 16:32:02 -07:00
  • 8ce830a8b0 [router][bugfix] Fix input_logprobs handling with None value and logprob_start_len = -1 (#11113) Chang Su 2025-09-30 16:09:40 -07:00
  • fb367acfcb Support Dots.ocr model (#11071) qrskannbara 2025-10-01 03:18:39 +08:00
  • a6cc86df9d Fix DSR1 accuracy for flashinfer_trtllm MoE with FP8 quantization (#11081) Trevor Morris 2025-09-30 10:33:12 -07:00
  • 229d2b95f1 [CPU] Adding Memory Capacity Acquisition Functionality (#11102) Zaili Wang 2025-09-30 23:41:20 +08:00
  • 9710f718fb [Eval] Add --repeat in run_eval (#11101) Liangsheng Yin 2025-09-30 23:35:54 +08:00
  • 91847e382a Fix eagle radix cache (#10846) Ke Bao 2025-09-30 22:59:20 +08:00
  • 5a290a5644 [router][grpc-server] Fix gRPC server shutdown (#11094) Simo Lin 2025-09-30 07:12:12 -04:00