Commit Graph

  • d1984e218c [router][grpc] Remove timeout for connections and remove max_tokens deprecation warning log (#11775) Chang Su 2025-10-17 12:36:36 -07:00
  • b79f75fd53 [Auto Sync] Update scheduler.py (20251017) (#11738) Yineng Zhang 2025-10-17 12:36:07 -07:00
  • 8fcc69e7c4 Turn on shm_allreduce and shm_allgather for fp16 (#10725) Chunyuan WU 2025-10-18 03:35:20 +08:00
  • f440baa136 [Feature] Reuse flashinfer workspace for PD-Multiplexing. (#11540) ykcombat 2025-10-18 02:35:06 +08:00
  • 2bc3fcd420 [doc] update router document (#11767) Keyang Ru 2025-10-17 10:26:54 -07:00
  • a5978a20f0 [router] fix grpc client time out to 1h (#11768) Simo Lin 2025-10-17 10:26:12 -07:00
  • e483c1eae5 [router] Fix UTF-8 Boundary Panic in Stop Sequence Decoder (#11766) Simo Lin 2025-10-17 10:21:00 -07:00
  • da681f35d3 Revert "Set csgmv as default lora backend. (#11488)" (#11735) Yineng Zhang 2025-10-17 10:01:36 -07:00
  • 9b0f725b1d add tuned fuse moe kernel for qwen3 235b fp8 on h200 (#11730) pdasgup 2025-10-17 09:55:09 -07:00
  • cde5a6e30f Abstraction for spec worker and code cleanup (#11643) Liangsheng Yin 2025-10-17 23:31:36 +08:00
  • 3e4c7da2f5 ci: reduce and refactor vlm ut and combine test files (#11062) Mick 2025-10-17 23:24:50 +08:00
  • d88ac9bc9a [overlap-spec] Make plan stream an option (#11724) Liangsheng Yin 2025-10-17 15:48:57 +08:00
  • ce11dd82dc [CI] Try fix broken event loop init (#11746) Liangsheng Yin 2025-10-17 13:30:17 +08:00
  • 9e87b60f37 [router][CI] Clean up deprecated fields in pr-test-pd-router.yml (#11739) Chang Su 2025-10-16 19:01:00 -07:00
  • 7780230a15 Revert "[router] fix get_models endpoint for openai router (#11687)" (#11740) Keyang Ru 2025-10-16 18:36:53 -07:00
  • dc01313da1 [router] Add rustfmt and set group imports by default (#11732) Chang Su 2025-10-16 17:33:29 -07:00
  • 7a7f99beb7 [router] add spec.rs to enables tests under spec folder (#11734) Keyang Ru 2025-10-16 16:07:26 -07:00
  • fd389df96e Reduce the image processing latency in VLM (#11541) StonyPort 2025-10-17 06:00:03 +08:00
  • b0d1d717e1 Revert "make radix cache deterministic" (#11728) Baizhou Zhang 2025-10-16 16:36:15 -05:00
  • c7962868c1 [router] Fix tool_choice normalization in ChatCompletionRequest and fix ut (#11731) Chang Su 2025-10-16 14:20:13 -07:00
  • 4f24ab1718 [router][grpc] add dissag info to warm up in grpc server (#11727) Simo Lin 2025-10-16 14:19:55 -07:00
  • 64affab495 [router] fix p and d worker filtering and bootstrap port handling (#11729) Simo Lin 2025-10-16 14:19:39 -07:00
  • 4c9bcb9d56 [Router] Refactor protocol definitions: split spec.rs into modular files (#11677) Keyang Ru 2025-10-16 13:44:44 -07:00
  • 86b04d25b3 model: qwen3-omni (thinker-only) (#10911) Mick 2025-10-17 04:20:38 +08:00
  • 85ebeecf06 chore: bump SGLang version to 0.5.3.post3 (#11693) sglang-bot 2025-10-16 13:14:55 -07:00
  • 0dd6cf16ba [ci]use H20 to run disaggregation test (#11543) Hank Han 2025-10-17 02:42:42 +08:00
  • 0975ba99bc [router] fix get_models endpoint for openai router (#11687) Keyang Ru 2025-10-16 09:00:08 -07:00
  • 1de3924b18 [CI] Add GLM4MoE model test (#11706) Shangming Cai 2025-10-16 16:25:58 +08:00
  • 3cceaa381a [Bugfix] Fix Qwen3/DSV3/DSV3.2 model support (#11510) Even Zhou 2025-10-16 15:14:09 +08:00
  • b0d20cdec7 Set csgmv as default lora backend. (#11488) Lifu Huang 2025-10-15 21:53:24 -07:00
  • cbac499750 Split test_intel_amx_attention_backend.py to pass CI of timeout (#11370) YanbingJiang 2025-10-16 10:22:32 +08:00
  • 476c67d7fc Fix missing a2a backend init of GLM4.5 MoE Block (#11692) Shangming Cai 2025-10-16 10:13:08 +08:00
  • 3289da5b41 [sgl-kernel] support hadamard (#11663) Fan Yin 2025-10-16 10:00:44 +08:00
  • 868403f642 [PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912) Shangming Cai 2025-10-16 09:59:14 +08:00
  • 97d857c096 [Mamba] Increase default mamba_full_memory_ratio to 0.9 (#11679) Hanming Lu 2025-10-15 18:56:43 -07:00
  • 52a54a26b2 docs: Add Contributor Covenant Code of Conduct (#11689) Yineng Zhang 2025-10-15 18:50:26 -07:00
  • cd7e1bd591 Sync code and test CI; rename some env vars (#11686) Lianmin Zheng 2025-10-15 18:37:03 -07:00
  • 729b7edf72 enable rmsnorm on XPU (#10248) Huaiyu, Zheng 2025-10-16 08:54:18 +08:00
  • 4c03dbaaef [CI][XPU]enable sglang CI on Intel XPU (#9493) DiweiSun 2025-10-16 08:13:19 +08:00
  • baf277a9bf chore: bump SGLang version to 0.5.3.post2 (#11680) sglang-bot 2025-10-15 16:49:14 -07:00
  • f5d30dae89 [router] Refactor StopSequenceDecoder to Use Sequence for Incremental Decoding (#11676) Simo Lin 2025-10-15 16:31:03 -07:00
  • 2479b89405 [router][grpc] Simplify model_id determination (#11684) Chang Su 2025-10-15 15:56:58 -07:00
  • 5464457251 [sgl-kernel] Optimize gguf test (#11667) Fan Yin 2025-10-16 06:45:53 +08:00
  • 6c01844f45 [sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674) Qi Yuhang 2025-10-16 04:39:31 +08:00
  • f226d3da2a Fix missing json imports in serving_responses.py (#11681) Chang Su 2025-10-15 13:01:55 -07:00
  • d2478cd4ff [router] Fix response api related spec (#11621) Keyang Ru 2025-10-15 09:59:38 -07:00
  • 30ea4c462b [tool call] Fix prev_tool_call_arr management in base_format_detector.py (#11367) Chang Su 2025-10-15 09:51:51 -07:00
  • 6d0364681c Fix 1-step draft model forward (#11653) Shangming Cai 2025-10-15 19:11:33 +08:00
  • 8221f9ae8b Tiny cleanup some eagle unused codes (#11660) Liangsheng Yin 2025-10-15 17:24:08 +08:00
  • ab9187a20b docs: update sglang installation guide (#11659) Yineng Zhang 2025-10-15 00:35:48 -07:00
  • 6b143d62a2 Clean up some Qwen3-Next and deterministic code (#11585) Stefan He 2025-10-15 00:19:37 -07:00
  • 6bc503af73 [Doc] Update support matrix for attn and hybrid attn (#11293) b8zhong 2025-10-14 22:43:11 -07:00
  • b2c8566920 [BugFix][Qwen3-VL]: fix cu_seqlens in qwen3-vl (#11458) Zheng Wengang 2025-10-15 13:16:49 +08:00
  • 32803fb279 Super tiny improve FA3 import error message (#11590) fzyzcjy 2025-10-15 13:06:31 +08:00
  • 91fc5bb5a9 feat: add add_chunked_prefix_cache_attention_backend (#11636) Yineng Zhang 2025-10-14 21:48:13 -07:00
  • 780fbf2f38 [Fix] Fix accuracy bug in CSGMV kernel caching key. (#11579) Lifu Huang 2025-10-14 20:25:56 -07:00
  • 825432fce6 [1/N]Support DeepSeek-R1 w4a8 normal deepep (#8247) Jinwu 2025-10-15 11:10:53 +08:00
  • a40229f6f8 [1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423) Xun Sun 2025-10-15 10:40:54 +08:00
  • 74737b2863 [router] upgrade to 0.2.0 (#11642) Simo Lin 2025-10-14 22:10:30 -04:00
  • 40e0082d8d [router] add worker self discovery for metadata (#11638) Simo Lin 2025-10-14 22:07:25 -04:00
  • e9e120ac7a fix: upgrade transformers to 4.57.1 (#11628) Sahithi Chigurupati 2025-10-14 18:35:05 -07:00
  • e0c2af2ac2 [router] update router doc to latest features (#11639) Simo Lin 2025-10-14 21:32:30 -04:00
  • 1d7f783501 Refactor kv cache free (#11351) cctry 2025-10-14 17:45:19 -07:00
  • 325951460f [router][grpc] add warm up to grpc server (#11627) Simo Lin 2025-10-14 19:11:16 -04:00
  • 86373b9e48 fix: Update SGL_KERNEL_VERSION to 0.3.15 (#11633) Yineng Zhang 2025-10-14 14:45:28 -07:00
  • d314bf6010 Update install.md (#11631) Lianmin Zheng 2025-10-14 14:34:46 -07:00
  • e28c9e526f [Minor] Update xgrammar dependency (#11622) DarkSharpness 2025-10-15 04:46:50 +08:00
  • b98cf39866 [Auto Sync] Update collector.py (20251014) (#11625) Lianmin Zheng 2025-10-14 13:34:33 -07:00
  • 27d710457c [Auto Sync] Update scheduler.py, server_args.py (20251014) (#11623) Lianmin Zheng 2025-10-14 13:20:03 -07:00
  • c224a4c6cc Fix log for chunked prefix cache (#11624) Baizhou Zhang 2025-10-14 11:49:33 -07:00
  • 49345a68cf [router] update router readme to latest features (#11619) Simo Lin 2025-10-14 14:47:38 -04:00
  • 94d26d850d use non_blocking h2d in ForwardBatch.prepare_mlp_sync_batch. (#11605) strgrb 2025-10-15 02:30:59 +08:00
  • 9e8a15a74c [router] add chang and keyang to sgl router author (#11620) Simo Lin 2025-10-14 14:10:49 -04:00
  • 3962e39d7c [router] cleanup app context and move to startup (#11617) Simo Lin 2025-10-14 13:19:28 -04:00
  • eb8cac6fe2 [router] add py binding and readme for openai router and history backend (#11453) Keyang Ru 2025-10-14 09:42:34 -07:00
  • 5ea96ac7cc Reduce one step decode for draft model. (#11561) Liangsheng Yin 2025-10-14 23:52:04 +08:00
  • 56222658ec move eagle draft post process to cuda graph (#11434) yinghui 2025-10-14 16:50:53 +02:00
  • dc965db0e0 make radix cache deterministic (#10721) Alex Chi Z 2025-10-14 15:01:52 +02:00
  • 817e46f412 Refactor spec decoding metrics calculation into separate TokenizerManager utility function (#11586) Scott Lee 2025-10-14 05:45:49 -07:00
  • 5a33c3aae7 Optimize Triton Draft Backend (#11556) Liangsheng Yin 2025-10-14 20:08:32 +08:00
  • 9767a1e41b Update release-docker-dev.yml (#11603) sglang-bot 2025-10-14 03:06:48 -07:00
  • 1d08653972 [AMD CI] Add image and weights caching. (#11593) Sai Enduri 2025-10-14 02:51:35 -07:00
  • a04efc4933 [router] when given both local tokenizer and chat template, log all (#11601) Simo Lin 2025-10-14 05:22:58 -04:00
  • 642fa966f2 [Docs] [Router]: Update sg-router doc on circuit breaker (#11449) Wenyi Xu 2025-10-14 17:18:14 +08:00
  • da7fac1b75 [router] allow router launch server to use grpc mode (#11600) Simo Lin 2025-10-14 04:42:43 -04:00
  • 28ad2297a0 [router] delete useless table content comment in spec (#11597) Simo Lin 2025-10-14 04:08:18 -04:00
  • f7f9f8eceb Update news section in README.md (#11598) Lianmin Zheng 2025-10-14 00:49:39 -07:00
  • 4b62af92ef [router] change worker api to async instead of sync (#11566) Simo Lin 2025-10-14 03:32:21 -04:00
  • 0b9915c132 [router] update generate spec to align with sgl io struct (#11591) Simo Lin 2025-10-14 02:51:33 -04:00
  • 27ef1459e6 [router][protocols] Add Axum validate extractor and use it for /v1/chat/completions endpoint (#11588) Chang Su 2025-10-13 22:51:15 -07:00
  • e4358a4585 Add fused_moe_triton config: triton_3_4_0/E=256,N=256,device_name=NVIDIA_B200.json (#11587) Qiaolin Yu 2025-10-13 22:24:43 -07:00
  • ba2ce28fe9 [Auto Sync] Update model_config.py (20251014) (#11580) Lianmin Zheng 2025-10-13 22:16:34 -07:00
  • 98923880bc chore: bump sgl-kernel version to 0.3.16.post2 (#11583) sglang-bot 2025-10-13 20:52:38 -07:00
  • f792e3c561 Revert "[NVIDIA] BUMP FA3 (#11444)" (#11582) Yineng Zhang 2025-10-13 20:51:45 -07:00
  • 28f80b1244 Implement LRU eviction policy for LoRA adapters (#11041) Chenxi Li 2025-10-13 20:18:25 -07:00
  • 88a6f9dab5 bench_serving support PD Disaggregation (#11542) Xiaoyu Zhang 2025-10-14 10:43:26 +08:00
  • cb8ed2c09a Make DeepEP combine recv do not overlap (#11535) fzyzcjy 2025-10-14 09:40:42 +08:00
  • 384733639a [DSv32] Use torch.compile for _get_logits_head_gate (#11565) Trevor Morris 2025-10-13 18:38:39 -07:00
  • aaf7af1b17 [FEATURE] Add Profile Trace Merger for Distributed Traces (#11413) Neelabh Sinha 2025-10-13 18:20:17 -07:00
  • 932e263725 Compilation Folder Reset (#11539) Yuwei An 2025-10-13 18:19:12 -07:00