Commit Graph

  • fe531d6f4e [Bug] Fix Issue#10215 (#10572) Yuhao Yao 2025-09-25 09:51:50 +08:00
  • c4e314f986 Restruct sgl-kernel benchmark (#10861) Xiaoyu Zhang 2025-09-25 07:45:25 +08:00
  • 7a06ef984d [router] consolidate health endpoints and flush cache (#10876) Simo Lin 2025-09-24 18:23:21 -04:00
  • 4a87ba217f router-grpc: Add tools processing and other paramters for apply_chat_template (#10877) Chang Su 2025-09-24 15:23:06 -07:00
  • d7b20dd65d chore: Initial support for input config files (#10534) kushanam 2025-09-24 14:45:52 -07:00
  • c3faf2d6e6 [router] select first healthy worker on proxied get requests (#10827) luna 2025-09-24 15:45:41 -03:00
  • 9209b209be router-grpc: Support jinja chat template content format detection (#10832) Chang Su 2025-09-24 11:45:01 -07:00
  • adba172fd1 ci: free space on workers for build (#10786) ishandhanani 2025-09-24 17:58:22 +08:00
  • cd641a995c fix bailing_moe with enable_dp_attention (#10860) GuoweiWangU 2025-09-24 17:29:32 +08:00
  • 71f24ef8f6 feat: add cache_salt support to request (#10718) Xinyuan Tong 2025-09-23 23:30:25 -07:00
  • b1f0fc1c0b Add CI timeout guidelines (#10829) Lianmin Zheng 2025-09-23 22:08:02 -07:00
  • 32d893730f Revert "[fix][pd-disag]no need set next batch sampling info done in prefill" (#10828) Lianmin Zheng 2025-09-23 17:02:01 -07:00
  • f47a2c67e6 [Auto Sync] Update load_config.py, model_config.py, configu... (20250923) (#10825) Lianmin Zheng 2025-09-23 16:48:12 -07:00
  • ee704e6265 [router] add auth middleware for api key auth (#10826) Chang Su 2025-09-23 16:07:34 -07:00
  • f4e3ebeb05 [router] Support streaming for Openai Router Response api (#10822) Keyang Ru 2025-09-23 14:56:28 -07:00
  • 312bfc4c95 [Auto Sync] Update simple_eval_common.py (20250923) (#10824) Lianmin Zheng 2025-09-23 13:50:47 -07:00
  • e290303ea1 [Auto Sync] Update elementwise.py (20250923) (#10823) Lianmin Zheng 2025-09-23 13:50:22 -07:00
  • aab35bccb4 fix: draft model IMA by overide max_positional_embeddings (#10787) Xinyuan Tong 2025-09-23 12:56:16 -07:00
  • 42aedb02af [Auto Sync] Update protocol.py (20250923) (#10820) Yineng Zhang 2025-09-23 12:49:56 -07:00
  • 984730b732 add tunning files for QWEN-3-NEXT (#10794) Yiakwy 2025-09-24 03:46:30 +08:00
  • 23632d350c Fix latest main ci (#10799) Shangming Cai 2025-09-24 03:46:13 +08:00
  • 08b8c0c3cd [router] fix axum default body limit (#10818) Chang Su 2025-09-23 12:44:17 -07:00
  • d42975c641 Remove duplicate code in qwen2 model (#10540) Lzhang-hub 2025-09-24 02:40:51 +08:00
  • adc24a3a0c fix ceval (#10504) ZhengHSI 2025-09-24 02:35:25 +08:00
  • 7ff93e613f router(grpc): Implement route for chat_cmpl endpoint (#10761) Chang Su 2025-09-23 11:26:33 -07:00
  • b24b2e7ed7 [router] use dashmap for radix tree instead of hash for multi model (#10814) Simo Lin 2025-09-23 14:25:53 -04:00
  • 7135db5d31 [ROCm] Update aiter to v0.1.5.post3 (#10812) sogalin 2025-09-24 01:54:12 +08:00
  • 4b5ef3002c [fix][pd-disag]no need set next batch sampling info done in prefill (#10259) Jimmy 2025-09-24 01:24:36 +08:00
  • 4f564b9e83 model: support qwen3-vl series (#10323) Zheng Li 2025-09-24 01:15:52 +08:00
  • 98c3b04ff2 [router] responses api POST and GET with local storage (#10581) Simo Lin 2025-09-23 12:12:02 -04:00
  • ddab4fc7c7 [router] fix cache aware routing strategy and lock contention (#10773) Simo Lin 2025-09-23 11:53:49 -04:00
  • d21c35224d Fix hicache mooncake backend CI (#10792) Shangming Cai 2025-09-23 17:04:44 +08:00
  • 4a762041d7 move environ into sglang.srt to avoid break SRT auto sync. (#10791) Liangsheng Yin 2025-09-23 17:04:20 +08:00
  • ea338676b5 Clean up server args (#10770) Lianmin Zheng 2025-09-23 00:22:32 -07:00
  • b06db198ba followup: clean up dockerfiles and release yamls (#10783) ishandhanani 2025-09-23 15:19:46 +08:00
  • 8c1ef0f914 chore: upgrade sgl-kernel 0.3.12 (#10782) Yineng Zhang 2025-09-23 00:18:54 -07:00
  • f5a2faf2b8 Introduce FutureMap (#10715) Liangsheng Yin 2025-09-23 14:27:30 +08:00
  • 1c82d9db28 feat: unify dockerfiles (#10705) ishandhanani 2025-09-23 14:23:48 +08:00
  • 9241f4fd20 Move cached kernel to srt.utils (#10776) Lifu Huang 2025-09-22 23:00:36 -07:00
  • 063c3791fe Fix trtllm_mla slow concat kernel in MTP (#10777) fzyzcjy 2025-09-23 13:47:49 +08:00
  • 632b7d8cc9 Use simulate acc len from sglang.environ (#10771) Liangsheng Yin 2025-09-23 12:59:50 +08:00
  • 16adf3dcab [router] fix logger type mismatch (#10774) Chang Su 2025-09-22 21:02:28 -07:00
  • c3a1d7759f [router] remove pd router draining channel (#10767) Simo Lin 2025-09-22 23:49:33 -04:00
  • 89971c4c3c [router] refactor router and worker management 4/n (#10756) Simo Lin 2025-09-22 21:35:10 -04:00
  • 113f8f65a2 [Auto Sync] Update configurer.py (20250923) (#10765) Lianmin Zheng 2025-09-22 17:34:34 -07:00
  • e22f3a5ec9 [Ascend]optimize Qwen3 on Ascend (#10574) ronnie_zheng 2025-09-23 03:18:36 +03:00
  • 095093ee5a [Ascend] optimize Qwen-vl on Ascend (#10556) ronnie_zheng 2025-09-23 03:18:16 +03:00
  • d27a6f7092 [Feature] Add MLAProcess for DeepSeek MLA on NPU (#10130) Even Zhou 2025-09-23 08:17:48 +08:00
  • 0753ef831e [Auto Sync] Update flashattention_backend.py (20250922) (#10762) Yineng Zhang 2025-09-22 16:41:42 -07:00
  • 662393f27d fix: kv events with tp > 1 (#10541) ishandhanani 2025-09-23 06:55:44 +08:00
  • b1bb8e7490 Enables TRT-LLM backend to be used for target_verify (#10281) pranavm-nvidia 2025-09-22 15:54:00 -07:00
  • 38c00ed7a1 Fix multimodal registry and code sync scripts (#10759) Lianmin Zheng 2025-09-22 15:36:01 -07:00
  • d4041a5eeb refactor zero copy (#10300) pansicheng 2025-09-23 06:17:31 +08:00
  • 2f555c4cee [Generative Score API] Added test_scores_api.py to github CICD to run per commit (#10755) Vedant Jhaveri 2025-09-22 14:41:57 -07:00
  • e53df7c009 chore: bump sgl-kernel v0.3.12 (#10732) Yineng Zhang 2025-09-22 14:39:25 -07:00
  • 9c53dad809 Fix MTP MoE weight loading with NVFP4 target model. (#10758) Jue WANG 2025-09-22 17:21:57 -04:00
  • 7ca1bea63d [router] update ci so only execute benchmarks when labels are added (#10757) Simo Lin 2025-09-22 16:23:07 -04:00
  • 97c3823931 [router] refactor router and worker management 3/n (#10727) Simo Lin 2025-09-22 15:17:50 -04:00
  • 60dbbd086a bugfix: Fix get_worker_urls_for_model in http/router.rs (#10754) Chang Su 2025-09-22 11:10:31 -07:00
  • aa1c5cf5bd Add warnings and remove dependency for deterministic inference (#10724) Baizhou Zhang 2025-09-22 10:56:02 -07:00
  • 592caab66a [Docs, minor] Fix LLM doc matrix (#10753) Adarsh Shirawalmath 2025-09-22 22:59:55 +05:30
  • 2101d93b4f Fix CI TestChunkedSGMV (#10737) Lifu Huang 2025-09-22 01:09:58 -07:00
  • 70e4b21853 Fix flaky logprobs test (#10728) Shangming Cai 2025-09-22 15:46:26 +08:00
  • 944f1ea0ec fix capture_bs when speculative decoding enabled (#10730) feng397 2025-09-22 15:43:12 +08:00
  • 9d7e82a0ab EPLB: prefer to use physical experts in the same node (#9849) Yingchun Lai 2025-09-22 15:34:30 +08:00
  • f05805515b Convert FLASHINFER_WORKSPACE_SIZE to integer (#10731) Yang Yu 2025-09-22 15:26:42 +08:00
  • 635ccda673 [4/4] Introduce CachedKernel to reduce CSGMV kernel launch overheads by 60% (#10709) Lifu Huang 2025-09-21 22:26:42 -07:00
  • 1c3dbad8fe [Ascend] codeowner updates for ascend related files (#10699) ronnie_zheng 2025-09-22 05:47:01 +03:00
  • e2ac7888b8 [2/2] Support deterministic inference for temperature > 0 (#10678) Qiaolin Yu 2025-09-21 19:36:08 -07:00
  • 86527a4799 [deterministic inference] Move batch invariant pkg to sglang (#10695) Stefan He 2025-09-21 19:35:14 -07:00
  • 134b4f7ec2 Support deterministic inference with triton backend (#10694) Ethan (Yusheng) Su 2025-09-21 18:20:40 -07:00
  • f67d1f45bc [Auto Sync] Update deepseek_v2.py (20250922) (#10717) Yineng Zhang 2025-09-21 17:43:50 -07:00
  • 0f04a5f428 Optimize cutlass int8 gemm kernel for large M on SM89 Ada GPU (#10714) Qi Yuhang 2025-09-22 08:04:27 +08:00
  • 2f18602f13 fix: disable gpt-oss b200 ut (#10716) Yineng Zhang 2025-09-21 17:02:25 -07:00
  • 56321e9fc2 [Router]fix: fix get_load missing api_key (#10385) Jimmy 2025-09-22 03:28:38 +08:00
  • 12d6cf18f0 Refactors radix cache for extra key support (#10317) Xinyuan Tong 2025-09-21 11:16:16 -07:00
  • fc3e542009 Update release-docs.yml (#10706) sglang-bot 2025-09-21 00:22:21 -07:00
  • 08ecd0aa2a [3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592) Lifu Huang 2025-09-20 22:47:48 -07:00
  • 720c1c8ca3 Super tiny fix extra logs (#10697) fzyzcjy 2025-09-21 12:30:54 +08:00
  • d403c143e3 feat: update server args (#10696) Yineng Zhang 2025-09-20 18:52:19 -07:00
  • cba0d8c309 [Feature] Support deterministic inference with FA3 backend (#10651) Stefan He 2025-09-20 17:50:21 -07:00
  • f1d7892318 [Auto Sync] Update modelopt_quant.py (20250920) (#10688) Yineng Zhang 2025-09-20 02:37:49 -07:00
  • 7c876de7f5 fix: remove awq_dequantize deps (#10686) Yineng Zhang 2025-09-20 01:47:01 -07:00
  • ba94b82986 fix: update run_suite (#10685) Yineng Zhang 2025-09-20 01:22:06 -07:00
  • 2b7417bf6a fix(disagg): fix sending KV cache in case of MLA for NIXL backend (#10673) dmitrygx 2025-09-20 10:30:10 +03:00
  • f111649580 Replace os.environ in layernorm.py (#10684) Baizhou Zhang 2025-09-20 00:20:33 -07:00
  • bd7eb0205a [Performance] Qwen3-Next: optimize causal_conv1d_fn triton kernel - up to 9% faster (#10680) Binyao Jiang 2025-09-20 00:12:43 -07:00
  • 74cd6e3902 chore: upgrade mooncake 0.3.6.post1 to fix gb200 dockerfile (#10681) Shangming Cai 2025-09-20 15:12:26 +08:00
  • b17e67df36 [Auto Sync] Update deepseek_v2.py (20250920) (#10683) Yineng Zhang 2025-09-19 23:43:31 -07:00
  • 8ecef73f12 [1/2] Support deterministic inference with flashinfer attention backend (#10645) Baizhou Zhang 2025-09-19 23:34:29 -07:00
  • 1d1ce62495 [router] refactor router and worker management 2.5/n (#10677) Simo Lin 2025-09-19 23:54:40 -04:00
  • 60e2a7cead [Auto Sync] Update model_runner.py (20250920) (#10679) Yineng Zhang 2025-09-19 18:26:54 -07:00
  • d88ef4a388 limit sgl-kernel causal conv1d to cuda only (#10648) Jinyan Chen 2025-09-20 07:59:37 +08:00
  • 6f993e8b9e chore: cleanup docker image (#10671) Yineng Zhang 2025-09-19 16:56:49 -07:00
  • 03ce92e594 router-spec: Reorder ChatCompletionRequest and fix validation logic (#10675) Chang Su 2025-09-19 16:41:21 -07:00
  • 00eb5eb721 [router] refactor router and worker management 2/n (#10666) Simo Lin 2025-09-19 15:37:57 -04:00
  • dab4663b4e [Auto Sync] Update .clang-format (20250919) (#10670) Yineng Zhang 2025-09-19 12:31:44 -07:00
  • 610a6d6e86 fix: resolve sync issue (#10668) Yineng Zhang 2025-09-19 12:05:39 -07:00
  • 36efd5be8a [router] refactor router and worker management 1/n (#10664) Simo Lin 2025-09-19 09:19:57 -04:00
  • 68cdc1893d [router] preserve order of json params using preserve_order feature (#10661) Fabian Gebhart 2025-09-19 15:15:22 +02:00