Commit Graph

  • b962a296ed chore: upgrade sgl-kernel 0.3.7 (#9708) Yineng Zhang 2025-08-27 14:00:31 -07:00
  • aa3eba8eb4 [sgl-kernel] misc: update deepgemm version for sgl-kernel (#9340) PGFLMG 2025-08-28 03:01:30 +08:00
  • 07ee0ab750 [router] add gpt-oss and glm4 tool parser (#9703) Simo Lin 2025-08-27 11:26:00 -07:00
  • 5c06dcb75a [router] add kimi-k2 tool parser (#9702) Simo Lin 2025-08-27 11:04:55 -07:00
  • 6f6beca49d [router] add step3 tool parser (#9695) Simo Lin 2025-08-27 10:44:52 -07:00
  • 68a54e063e Sets default model name in request classes (#9683) Xinyuan Tong 2025-08-27 17:43:03 +00:00
  • fd18995cf3 Fix get_ip when no external network (#9700) ybyang 2025-08-28 01:28:52 +08:00
  • db0831e019 Quick fix for loading processor for supporting internvl3_5 series (#9676) yilian49 2025-08-27 12:05:27 -04:00
  • 6e4e1c8cdc [router] add deepseek tool parser (#9694) Simo Lin 2025-08-27 06:18:24 -07:00
  • 9768c50d90 [router] restructure tool parser module folder (#9693) Simo Lin 2025-08-27 06:05:53 -07:00
  • fd71b11b1d move is_sm90_supported/is_sm100_supported to python/sglang/srt/utils.py (#9679) Lianmin Zheng 2025-08-27 03:34:29 -07:00
  • ae7428a8a7 fix mooncake store mla zero copy meta (#9678) huangtingwei 2025-08-27 15:43:16 +08:00
  • a3aee7c377 fix: HiRadixCache: fix prefetch completion race (#9397) Pablo Iyu Guerrero 2025-08-27 09:43:01 +02:00
  • 79e6a8a6ac support cuda 13.0 and trtllm kernel by Aug 25 2025 (#9495) Rain Jiang 2025-08-26 23:13:27 -07:00
  • 8f7b1c31e8 Add A100 fused MoE kernel configs for Dpsk (#9677) ehuaa 2025-08-27 11:49:48 +08:00
  • b9683be653 Support DeepSeek-V3.1 tool call (#9446) Xu Wenqing 2025-08-27 11:22:19 +08:00
  • a85363c199 [docs] Instructions for bench_serving.py (#9071) yhyang201 2025-08-27 09:30:57 +08:00
  • b21fdd5373 feat: (chat-template matching) enhance multimodal model detection with config.json (#9597) Kevin Tuan 2025-08-27 08:55:40 +08:00
  • c04c17edfa refactor(hicache): Introduce generic HiCacheStorageConfig for improved configuration management (#9555) hzh0425 2025-08-27 08:55:20 +08:00
  • 16a6d21b95 chore: enhance bench_serving for vlms with a new dataset of configurable image count and resolution (#9583) Mick 2025-08-27 08:42:54 +08:00
  • a530b3ffdc [RL] fix register the same ops multiple times (#9564) Stefan He 2025-08-26 16:24:44 -07:00
  • 603b3446dc Fix FA3 swa spec verify topk>1 (#9658) Ke Bao 2025-08-27 06:03:14 +08:00
  • b6c14ec0b4 add response_format support for completion API (#9665) cicirori 2025-08-27 00:01:29 +02:00
  • 43de1d7304 HiCache Storage fix host memory leak (#9648) Zhiqiang Xie 2025-08-26 10:49:40 -07:00
  • 79ce3688bb BugFix(hicache): Fix host indices out of bound error (#9637) hzh0425 2025-08-27 01:42:23 +08:00
  • 44ffe2cb72 Install py-spy by default for containers for easier debugging (#9649) fzyzcjy 2025-08-27 01:40:52 +08:00
  • 1a0896e9c0 [doc] add kimik2 --tool-call-parser (#9647) Xiaotong Jiang 2025-08-26 10:39:40 -07:00
  • 90313fb09a [router] add token bucket rate limiter (#9656) Chang Su 2025-08-26 10:36:26 -07:00
  • 3578eb1e9b [router] address worker load tracking consistency (#9523) Simo Lin 2025-08-26 06:40:51 -07:00
  • 0936c766ed Fix kimi k2 function calling format (#9606) Xiaotong Jiang 2025-08-26 00:50:59 -07:00
  • 0ef583b7de fix: allow user to specify function as role (#9635) GavinZhu-GMI 2025-08-26 15:47:20 +08:00
  • f7881a27f9 Add reasoning_effort param in TiktokenTokenizer.apply_chat_template (#9630) Liu Shaohui 2025-08-26 15:44:20 +08:00
  • fdff3167c5 [docs] Update README with additional highlights and resources for SGLang x AMD SF Meetup (#9640) Mingyi 2025-08-26 00:40:39 -07:00
  • cbc0e4d779 Fix lint for router (#9636) Stefan He 2025-08-26 00:38:53 -07:00
  • 4cd08dc592 model: Support nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (#9301) Netanel Haber 2025-08-26 10:33:40 +03:00
  • f92b729d52 [new feat] ascend backend support fia fusion kernel (#8328) ZhengdQin 2025-08-26 14:13:08 +08:00
  • e2e378caba [router] add ut for mistral, llama, pythonic, and streaming tool parser (#9632) Simo Lin 2025-08-25 22:02:15 -07:00
  • dc1decc6af [router] add llama tool parser (#9629) Simo Lin 2025-08-25 20:43:36 -07:00
  • 03680f33be [router] add pythonic parser (#9628) Simo Lin 2025-08-25 20:40:06 -07:00
  • d4c5e53401 [router] add qwen tool parser (#9623) Simo Lin 2025-08-25 20:32:05 -07:00
  • 817c62a077 [router] add mistral tool parser (#9622) Simo Lin 2025-08-25 20:09:51 -07:00
  • 0ff7241995 Improve bench_one_batch_server script (#9608) Liangsheng Yin 2025-08-26 10:38:37 +08:00
  • 80dc76e11a [Fix] HiCache Bugfix & Mooncake Error Handling Enhance (#8901) ykwd 2025-08-26 10:05:10 +08:00
  • 9b08d975a0 [docs] Refactor, remove compiled results and add gpt-oss (#9613) Chayenne 2025-08-25 15:27:06 -07:00
  • a0a77d937b Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190) Jonas 2025-08-26 00:26:26 +02:00
  • 24a8cee66d Fix GLM45v launch server cuda torch compile bug (#9554) Binyao Jiang 2025-08-25 13:46:28 -07:00
  • 3affa9dcc3 Fix GLM45 tool call multi-turn bug (#9500) Binyao Jiang 2025-08-25 13:46:13 -07:00
  • ea0696b924 [Performance] Batch Send from Tokenizer Manager. (#9436) Sundara Raman Ramachandran 2025-08-25 10:43:54 -07:00
  • 3aec3d4f8b [Doc] add LWS(LeaderWorkerSet) use case in sgl-router README (#9568) Bruce-x-1997 2025-08-25 23:32:31 +08:00
  • e3e97a120b chore: bump v0.5.1.post2 (#9592) Yineng Zhang 2025-08-25 03:45:09 -07:00
  • 051068673c chore: update config (#9591) Yineng Zhang 2025-08-25 03:41:09 -07:00
  • 9dcdf5da03 Tiny fix wrong comments (#9589) fzyzcjy 2025-08-25 18:08:10 +08:00
  • f8b757bcac fix: resolve tuning fused moe issue (#9587) Yineng Zhang 2025-08-25 01:41:15 -07:00
  • ebd9dbe71b fix: revert #8593 (#9581) Yineng Zhang 2025-08-25 01:29:06 -07:00
  • 938e986e15 chore: upgrade flashinfer 0.2.14.post1 (#9578) Yineng Zhang 2025-08-25 00:12:17 -07:00
  • 17d5eda887 bugfix for undefined logging functions in HarmonyBrowserTool & HarmonyPythonTool (#9229) Yuhao Zhou 2025-08-25 15:10:35 +08:00
  • 71a7f1d86f Offload tensors by sharding on GPU (#9536) fzyzcjy 2025-08-25 15:02:49 +08:00
  • 433266c125 Reintroduce memory usage fix (#9535) fzyzcjy 2025-08-25 15:02:31 +08:00
  • fda4792620 Update CUTLASS 4.2 & Enable K-Major Scale Factor for SM90 FP8 Blockwise Group GEMM (#9559) Qi Yuhang 2025-08-25 14:24:43 +08:00
  • a0b22f2f17 remove redundant rank0_log function. (#9560) miter 2025-08-25 14:17:55 +08:00
  • b5c6529e17 [PD] Improve disaggregation metrics output: update the metrics to keep reflecting real stats (#7317) SCDESPERTATE 2025-08-25 14:16:43 +08:00
  • ca4b86c564 fix: Update OpenAI client base URL in documentation (#9576) Xinyuan Tong 2025-08-25 14:06:57 +08:00
  • dd6ec02965 Add target module validation for init adapters (#9429) Beichen Ma 2025-08-24 20:24:50 -07:00
  • bf863e3bbf fix: use sgl-kernel 0.3.5 (#9565) Yineng Zhang 2025-08-24 15:46:47 -07:00
  • 9e169ea8b5 [router] add right rustls dependency in sgl-router cargo.toml (#9498) Bruce-x-1997 2025-08-25 00:03:15 +08:00
  • e0ab167db0 chore: bump v0.5.1.post1 (#9558) Yineng Zhang 2025-08-24 01:14:17 -07:00
  • c807cd7c75 chore: update configurer (#9557) Yineng Zhang 2025-08-24 01:05:00 -07:00
  • 327f7b7c87 fix(grok): remove duplicate replicate_lm_head configuration (#9549) Vincent Zhong 2025-08-23 22:49:24 -04:00
  • 80425e59bb [doc] deepseekv31 support (#9544) Xiaotong Jiang 2025-08-23 16:54:58 -07:00
  • af9d4eb038 [readme] Include additional resources for the SGLang x AMD SF Meetup event (#9547) Mingyi 2025-08-23 16:51:16 -07:00
  • fb107cfd75 feat: allow use local branch to build image (#9546) gongwei-130 2025-08-23 16:38:30 -07:00
  • 97a38ee85b Release 0.5.1 (#9533) Lianmin Zheng 2025-08-23 07:09:26 -07:00
  • 86d10d220f Update grok.py and tiktoken tokenizer (#9532) Lianmin Zheng 2025-08-23 05:40:18 -07:00
  • 83871aa12d feat(hicache): Supports 3fs-hicache compatibility with dp-attention (#9372) hzh0425 2025-08-23 17:08:32 +08:00
  • b1b3f0b38f Partially unify triton per token group quant kernels (#9485) fzyzcjy 2025-08-23 17:07:31 +08:00
  • 34e5e11f0f Tiny make device_loading_context more static (#9478) fzyzcjy 2025-08-23 17:07:15 +08:00
  • 2600fc0d47 Overlapped weight offload (#8034) fzyzcjy 2025-08-23 17:06:46 +08:00
  • ccd3fb946e [fix] Fix mxfp4 triton MoE tp bug (#9473) hlu1 2025-08-23 01:48:40 -07:00
  • c9dd70fbde tool-call(dsv3): Improve deepseek-v3 chat template and tool_choice = required (#9525) Chang Su 2025-08-23 01:46:56 -07:00
  • 6b2b8bf0e1 fix: blackwell dsv3 fp8 issue temporary solution (#9530) Yineng Zhang 2025-08-23 01:33:21 -07:00
  • 4edbe0d534 [benchmark] Add benchmark scripts for ceval and boolq (#8946) yuxingcyx 2025-08-23 15:40:15 +08:00
  • 0374304a2c Add enable_flashinfer_mxfp4_bf16_moe for higher precision and slower moe backend (#9004) fzyzcjy 2025-08-23 15:38:40 +08:00
  • 127d4b0d5e Support GC Freezing to improve latency & throughput (#9241) Chanh Nguyen 2025-08-22 22:43:09 -07:00
  • 7e880286b5 Add support for extensions of interface and pre-registrations to NIXL HiCache (#9211) Moein Khazraee 2025-08-22 20:06:13 -07:00
  • 446c8e4cdb [router] ignore client error when record failure in pd_router (#9503) Bruce-x-1997 2025-08-23 05:19:45 +08:00
  • 5ef545e678 [router] Move all protocols to spec.rs file (#9519) Keyang Ru 2025-08-22 14:18:47 -07:00
  • c4500233ff Add Qwen3-30B-A3B-Thinking-2507 support on AMD GPUs. (#9456) sogalin 2025-08-22 13:14:42 -07:00
  • f445a1d9a3 [AMD] Fix Llama 4 FP8 accuracy issues on MI300X (#7699) Hubert Lu 2025-08-22 13:13:45 -07:00
  • e5638573c1 [NVIDA] [1/N] Nvfp4 Masked Gemm: Add quant op for the flashinfer grouped gemm (#9200) Kaixi Hou 2025-08-22 12:19:45 -07:00
  • f556ac8bd8 [router] add json tool parser (#9516) Simo Lin 2025-08-22 12:13:04 -07:00
  • 110a65989b [MTP] Force greedy sampling on AMD (#9127) datdo-msft 2025-08-22 11:14:43 -07:00
  • 49f9d02538 [router] tokenizer arch doc (#9513) Simo Lin 2025-08-22 09:52:33 -07:00
  • 0f587e80d3 Use Tensor Core Decode when gqa group size >= 4 (#8624) Wenxuan Tan 2025-08-22 10:25:15 -05:00
  • 6078d5fcc0 [HiCacheStorage] backup optimization for MLA model (#8865) huangtingwei 2025-08-22 18:03:51 +08:00
  • 70cf4abccc 3fs zerocopy (#9109) pansicheng 2025-08-22 17:56:38 +08:00
  • cebf45994b [bugfix] Make --enable-hierarchical-cache and --disable-radix-cache mutually exclusive (#9452) Xuchun Shang 2025-08-22 17:49:52 +08:00
  • 9c0c1e30b2 Disable torch.compile for get_last_loc_large_page_size_large_top_k (#9507) Qiaolin Yu 2025-08-22 02:05:02 -07:00
  • a1f011d09a minor: determine mm attn backend based on platforms (#9303) Mick 2025-08-22 16:08:41 +08:00
  • 9ec314c6ac Support speculative decoding in the trtllm_mha attention backend (#9331) Qiaolin Yu 2025-08-21 23:53:35 -07:00
  • fedfe91c1a [Docs] Add doc and quick demo for gpt-oss responses api & buildin tools (#9497) Xinyuan Tong 2025-08-22 14:51:52 +08:00