Commit Graph

  • 443a1b4ab3 Update pyproject_other.toml maxiao1 2025-09-30 10:47:20 +00:00
  • 852a49c5cc adapt to dsv32 on dcu maxiao 2025-09-30 18:37:31 +08:00
  • 580051c5a8 chore: bump sgl-kernel v0.3.14 (#11067) PGFLMG 2025-09-30 17:53:24 +08:00
  • 8f7453e3af adapt to ds3.2 maxiao 2025-09-30 17:44:54 +08:00
  • 1237aa19ce [Auto Sync] Update fused_moe_triton_config.py (20250930) (#11099) Lianmin Zheng 2025-09-30 00:57:16 -07:00
  • 5991119541 [Fix] Resolve performance drop in speculative decoding aiter backend (#11087) jacky.cheng 2025-09-30 14:51:30 +08:00
  • 424591d53d Fix spec filter batch when target extend (#10991) Ke Bao 2025-09-30 14:44:02 +08:00
  • d1676cd483 [router][tool call] Full support for ToolChoice (#11085) Chang Su 2025-09-29 22:36:03 -07:00
  • 33b3c0f85f [router] grpc router generate endpoint support (#11070) Simo Lin 2025-09-30 01:07:53 -04:00
  • e5281f84d5 Update CODEOWNERS for attention/ascend_backend.py (#11092) Lianmin Zheng 2025-09-29 21:28:49 -07:00
  • d17986f8c6 Enable optional FP32 compute for LM Head (#10729) narutolhy 2025-09-29 20:45:17 -07:00
  • 8831c55c3d [model] added support for w8a8int8 used by neuralmagic/Qwen2-0.5B-Ins… (#9642) DevashishLal-CB 2025-09-29 20:26:17 -07:00
  • 2bc61dd194 Remove hybrid_linear_attn attention backend and refactor attention registry (#10816) li-kesen 2025-09-30 10:16:16 +08:00
  • 6535fda127 [Profile] dump memory trace when cuda graph profile is enabled (#11083) Cheng Wan 2025-09-29 17:36:48 -07:00
  • 3713eb6135 feat(reasoning): improve enable thinking from request (#10875) Jimmy 2025-09-30 07:59:08 +08:00
  • 5937a56d47 [router][grpc] Add logprobs support to router (#11082) Chang Su 2025-09-29 15:55:06 -07:00
  • f065e5bea5 [router] Use get_pooled in process_single_choice (#11079) Chang Su 2025-09-29 15:48:00 -07:00
  • 9de1320b63 fix: fp8 mllama4 without vision modules being quantized (#10611) Mick 2025-09-30 05:17:12 +08:00
  • dda34c2f93 Fix mem fraction static for nightly tests (#11076) Lianmin Zheng 2025-09-29 12:57:41 -07:00
  • 4eeaff74a0 [router][tool call] Separate JsonParser and LlamaParser (#11073) Chang Su 2025-09-29 10:26:37 -07:00
  • a17e70f5cc Use more general heuristics to set the default value of --mem-fraction-static (#10975) Lianmin Zheng 2025-09-29 10:11:03 -07:00
  • 816b3a433a [router] add n to generate sampling params (#11069) Simo Lin 2025-09-29 10:37:43 -04:00
  • 3a641d9085 chore: upgrade sgl-kernel 0.3.13 (#11056) Yineng Zhang 2025-09-29 02:22:25 -07:00
  • 6f16bf9d9d [Ci Monitor] Auto uploaded performance data to sglang_ci_data repo (#10976) Xiaoyu Zhang 2025-09-29 16:17:27 +08:00
  • 5942fdb480 chore: upgrade cutedsl 4.2.1 (#11054) Yineng Zhang 2025-09-29 00:24:17 -07:00
  • af4ab65606 [router][tool call] Improve normal content extraction and error handling (non-stream) (#11050) Chang Su 2025-09-29 00:19:30 -07:00
  • 11965b0daf Fix sgl-kernel benchmark dead code (#11022) Xiaoyu Zhang 2025-09-29 15:06:40 +08:00
  • 71959545df Fix gemma 3 launch with transformers: the error: AttributeError: 'TransformersForCausalLM' object has no attribute 'tp_size' (#9614) Vincent Zhong 2025-09-29 02:18:29 -04:00
  • 24f7cb1ece [speculative decoding] rename lookahead to ngram (#11010) Zhihao Zhang 2025-09-29 12:06:59 +08:00
  • e05555fad8 [HiCacheStorage] mooncake store support page_first_direct layout (#10591) huangtingwei 2025-09-29 11:45:48 +08:00
  • 43fa9f22bd fix: check if weights are already local before downloading (#11015) Mick 2025-09-29 11:11:33 +08:00
  • e98d9346c7 [1/2] Support FA4 for MHA Prefill in sgl-kernel (#10940) Lifu Huang 2025-09-28 19:59:14 -07:00
  • 0c9174108a Unify SGL Kernel Releases (#10701) Kangyan-Zhou 2025-09-28 19:48:28 -07:00
  • 2572886367 [router] add harmony tool parser base structure and interface (#11036) Simo Lin 2025-09-28 22:46:38 -04:00
  • dba751a896 [router][tool call] Support normal content extraction before tool call (streaming) (#11038) Chang Su 2025-09-28 19:46:06 -07:00
  • 2e7633982c fix: show failed models in nightly ci (#10986) Mick 2025-09-29 03:38:29 +08:00
  • 336e9a6058 [router] migrate to rust python module for pythonic parser (#11033) Simo Lin 2025-09-28 14:48:59 -04:00
  • abb6781573 Update GLM-4.5 Model Doc (#11017) Yuxuan Zhang 2025-09-29 02:21:27 +08:00
  • 07440f5f34 Fix FusedSetKVBufferArg in RotaryEmbedding (#11003) Lianmin Zheng 2025-09-28 11:17:27 -07:00
  • 9816989bff [HiCache] bug: fix mooncake store batch set v1 (#11013) Teng Ma 2025-09-28 23:18:48 +08:00
  • 42245551ef [sgl-kernel] Optimize concat_mla_k kernel (#10543) Yuan Luo 2025-09-28 23:04:22 +08:00
  • 2a9d995c09 prepare for sglang+verl (#10555) lbk-sys 2025-09-28 16:39:17 +08:00
  • a9050b5c39 [bugfix]Add empty_context import to two_batch_overlap.py (#10964) wejoncy 2025-09-28 16:26:20 +08:00
  • 66face3598 Fix CI failure of TypeError: RotaryEmbedding.forward_cpu() got an unexpected keyword argument 'fused_set_kv_buffer_arg' (#11009) YanbingJiang 2025-09-28 15:31:08 +08:00
  • 5519766a4d [router] fix chat template loading and tokenizer path (#10999) Simo Lin 2025-09-27 23:54:12 -04:00
  • 72392f2908 [router] basic mcp support for openai router response api (#10978) Keyang Ru 2025-09-27 18:49:33 -07:00
  • c1c8dd1dd0 [router][tool parser] Modify tool parser to return both normal text and tool calls (non-stream) (#10995) Chang Su 2025-09-27 15:10:17 -07:00
  • f6bc3f529b Fix profiler (#10997) Lianmin Zheng 2025-09-27 14:56:18 -07:00
  • 8cc27fdc46 Use jsonschema to constrain required or specific tool choice (#10550) Tejesh Anand 2025-09-27 10:18:50 -07:00
  • 9c339d6b47 [PD] Extract the PP transfer layer calculate logic from Mooncake to Common backend (#10565) Shangming Cai 2025-09-28 00:10:41 +08:00
  • e23e280e16 Add support for topk metadata transferring for PD (#10616) Shangming Cai 2025-09-28 00:09:38 +08:00
  • 51f7c6bd3c Add auth to get server info (#10751) Muqi Li 2025-09-27 17:54:39 +08:00
  • 62e2e99db6 fix: make inference deterministic for large TP (#10930) Xinyuan Tong 2025-09-27 02:46:45 -07:00
  • 8ebf72fef3 [Fix] RuntimeError: get_cfg Unsupported input_type:Float4_e2m1fn_x2 in using aiter-mxfp4-moe (#10981) kk 2025-09-27 13:12:22 +08:00
  • 8260574729 fix: fp8 quantization failure of qwen 2.5 VL 7B model (#10112) Yueyang Pan 2025-09-27 07:05:23 +02:00
  • 37f3325b06 [router][grpc] Support E2E non-stream chat completions (#10980) Chang Su 2025-09-26 22:02:06 -07:00
  • bd95944cf6 [Bugfix][Minor][Benchmark] Fix some bugs due to PR #10495 (#10982) Muqi Li 2025-09-27 13:01:05 +08:00
  • c8a5d12abe [HiCache]: Support dynamic loading backends for hicache (#10551) hzh0425 2025-09-27 09:34:11 +08:00
  • 2387c22b56 Ci monitor support performance (#10965) Xiaoyu Zhang 2025-09-27 09:11:21 +08:00
  • 592ddf374f Add simple docker file for B300 (#10944) hlu1 2025-09-26 17:26:57 -07:00
  • 0c3db88978 [router][grpc] Add helpfer functions for decoder in router.rs and fix specs (#10971) Chang Su 2025-09-26 17:10:45 -07:00
  • 2bdaf482f9 refactor loading weights from remote instance coding format (#10941) amysaq2023 2025-09-27 06:25:39 +08:00
  • 777eb53897 ci: refactor nightly test (#10495) Mick 2025-09-27 06:24:30 +08:00
  • 05a3526654 Restruct gpu_memory_settings in a unify function and relax max_cuda_graph_bs (#10372) Xiaoyu Zhang 2025-09-27 06:10:49 +08:00
  • e56c64bfaf Update label field comment to indicate deprecation (#10970) Lianmin Zheng 2025-09-26 12:59:59 -07:00
  • fff7fbabe6 ci: fix rate-limit of huggingface with hf auth login (#10947) Mick 2025-09-27 02:02:44 +08:00
  • aae7ead2d0 [router] remove old/oudated/useless comments across code base (#10968) Simo Lin 2025-09-26 13:48:50 -04:00
  • a7fe6e10a1 [router] remove old/oudated/useless comments (#10967) Simo Lin 2025-09-26 12:45:15 -04:00
  • be059b83d6 [router] grpc router regular mode import cleanup (#10963) Simo Lin 2025-09-26 07:06:59 -04:00
  • 5d4fe1ceee [router] add move grpc worker management from router to worker manager (#10960) Simo Lin 2025-09-26 06:57:57 -04:00
  • 1b011e68dc [router] move grpc client from router to worker and builder (#10958) Simo Lin 2025-09-26 06:13:47 -04:00
  • 5c0efa562b [router]fix code owner syntax error (#10956) Simo Lin 2025-09-26 06:07:18 -04:00
  • 1e57b9472d [router] add grpc client get and set (#10955) Simo Lin 2025-09-26 06:07:05 -04:00
  • a5095d6262 Fuse write kv buffer into rope for qwen3 moe & bailing moe (#10749) Yuan Luo 2025-09-26 15:18:41 +08:00
  • 6c2c467d77 [router] update owners for router components (#10927) Simo Lin 2025-09-26 02:46:46 -04:00
  • c3d2ad4ee6 CI: Fix docker manifest build (#10936) Sahithi Chigurupati 2025-09-25 23:22:55 -07:00
  • 7ec5b4e89c [PD-HiCache]: Support Async Offloading KVCache In Decode Side (#10192) hzh0425 2025-09-26 14:20:49 +08:00
  • 6088548216 Update CODEOWNERS to include JustinTong0323 in FC (#10939) Xinyuan Tong 2025-09-25 22:55:56 -07:00
  • 172bcf0152 Revert "Refactor kv_cache_scheme handling for quantization (#10132)" (#10935) Yineng Zhang 2025-09-25 20:14:15 -07:00
  • 37158f2018 router: Support parallel sampling num > 1 in grpc_server and non-stream handling (#10929) Chang Su 2025-09-25 20:03:35 -07:00
  • 3e95aa1a09 Remove pull_request trigger from CI monitor workflow (#10932) Lianmin Zheng 2025-09-25 19:40:38 -07:00
  • c4197e99bb [ci] add ci-monitor workflow (#10898) Xiaoyu Zhang 2025-09-26 10:29:47 +08:00
  • 0ac6114694 Replace the Kimi-K2 generated tool call idx with history tool call count (#10612) eraser00 2025-09-26 09:47:40 +08:00
  • 7dcd689b47 [router][refactor] Clean up protobuf fields (#10923) Chang Su 2025-09-25 17:48:47 -07:00
  • f7bab41a29 [router] change log level to warning (#10926) Simo Lin 2025-09-25 20:32:59 -04:00
  • f68dd998b9 Rename customer label -> custom label (#10899) Lianmin Zheng 2025-09-25 16:19:53 -07:00
  • 35ec2a45a8 [minor] Remove deprecated function get_ip (#10883) Lianmin Zheng 2025-09-25 16:18:04 -07:00
  • 0035f1cefa fix env flashinfer (#10910) Swipe4057 2025-09-26 01:44:48 +03:00
  • 5e21d6aec0 refactor: Move grpc/client.rs to grpc_client/sglang_scheduler.rs (#10924) Chang Su 2025-09-25 14:21:22 -07:00
  • cd4da1f19b Refactor kv_cache_scheme handling for quantization (#10132) Mohammad Miadh Angkad 2025-09-26 01:32:15 +08:00
  • 916784746b router: Fix constraint proto and build_constraint in grpc router (#10881) Chang Su 2025-09-25 08:12:06 -07:00
  • d511b2d905 [router] consolidate worker load monitoring (#10894) Simo Lin 2025-09-25 09:59:30 -04:00
  • 77830a265e Add fuse_moe per-channel tune (#10915) lukec 2025-09-25 21:12:09 +08:00
  • fce170480a integrate AIBrix KVcache (#10376) yi wang 2025-09-25 14:47:09 +08:00
  • 3d40794fcf [HiCache] Cleaning the deprecated host memory state (#10778) Zhiqiang Xie 2025-09-24 23:43:53 -07:00
  • c1f39013b7 [ci feature] add ci monitor (#10872) Xiaoyu Zhang 2025-09-25 14:16:29 +08:00
  • 3e43eb137b [Auto Sync] Update model_config.py (20250925) (#10885) Lianmin Zheng 2025-09-24 22:59:16 -07:00
  • 458c0219a6 [router] simplify tokenizer dev doc (#10895) Simo Lin 2025-09-25 01:15:56 -04:00
  • a73eb8cd20 [router] Support Oracle DB(ATP) Data Connector (#10845) Keyang Ru 2025-09-24 20:59:32 -07:00
  • e738703547 [router] consolidate worker get loads (#10880) Simo Lin 2025-09-24 22:13:31 -04:00