Commit Graph

  • 62f99e08b3 fix: wrong docker hub org name (#9137) li chaoran 2025-08-13 10:26:19 +08:00
  • 86a0be65d8 [Feature] Support custom set kv buffer kernel (#8884) DarkSharpness 2025-08-12 16:56:51 -07:00
  • 0edda32001 Support page first layout zero copy for mooncake store (#8651) huangtingwei 2025-08-13 06:59:26 +08:00
  • 924827c3de chore: use cp310 (#9130) Yineng Zhang 2025-08-12 15:33:22 -07:00
  • c81daf838d fix: update Dockerfile (#9129) Yineng Zhang 2025-08-12 15:01:29 -07:00
  • 25caa7a8a9 [AMD] Support Wave attention backend with AMD GPU optimizations (#8660) jacky.cheng 2025-08-13 04:49:11 +08:00
  • 03d114496f Fix typos in supported models documentation (#9119) Hangzhi 2025-08-12 13:35:24 -07:00
  • 83123f481e [Quantization] Supported w8a8 int8 quantized Gemma3 and Qwen-VL models (#8619) ichernob 2025-08-12 23:31:18 +03:00
  • 48afa8f14f [feat] Enable Ascend profiling on SGLang (#8610) ronnie_zheng 2025-08-13 04:28:31 +08:00
  • 2ecbd8b8bf [feat] add ascend readme and docker release (#8700) li chaoran 2025-08-13 04:25:42 +08:00
  • 305b27c124 fix: update Dockerfile (#9125) Yineng Zhang 2025-08-12 13:23:10 -07:00
  • 1ce30dd13e [router] update router documentation (#9121) Simo Lin 2025-08-12 13:16:34 -07:00
  • c9ee738515 Fuse writing KV buffer into rope kernel (part 2: srt) (#9014) Jiaqi Gu 2025-08-12 13:15:30 -07:00
  • 1f9ec65374 fix(docker): update sgl_kernel version to 0.3.4 in Dockerfile.gb200 (#9118) ishandhanani 2025-08-12 13:12:33 -07:00
  • ad359d1c71 router: Fix user guide link README.md (#9122) Chang Su 2025-08-12 12:29:10 -07:00
  • 5f5b3b2449 [5/n] DP Enhancement: Correct num_token_non_padded (#9107) Cheng Wan 2025-08-12 12:23:46 -07:00
  • 4caca4f6b4 Fix typo in REVIEWERS (#9113) Shangming Cai 2025-08-13 02:55:49 +08:00
  • f2a5de284b [Bugfix] Fix accuracy-test-1-gpu failure caused by builtin_tools (#9114) Chang Su 2025-08-12 09:56:13 -07:00
  • 445f9dca6e Runtime check CUDA driver version to avoid unresolved green context symbols (#9021) Liangsheng Yin 2025-08-12 09:26:10 -07:00
  • 3a9afe2a42 chore: bump sgl-kernel v0.3.4 (#9103) Yineng Zhang 2025-08-12 01:48:47 -07:00
  • 9aea255522 Fuse writing KV buffer into rope kernel (part 1: sgl-kernel) (#9077) fzyzcjy 2025-08-12 16:46:40 +08:00
  • fcc11e5ed5 update support new models doc (#9096) Yichao Cheng 2025-08-12 01:21:02 -07:00
  • 5190ba7f42 Fuse two kernels of hidden states padding into quantization kernel (#9005) fzyzcjy 2025-08-12 16:20:13 +08:00
  • 5438886c87 docs: fix broken links in README.md (#9075) Hsiang-Yu Tsou 2025-08-12 15:03:35 +08:00
  • 9c83d74da3 bugfix: Fix the commentary msg extraction in GptOssDetector (#9097) Chang Su 2025-08-11 23:53:10 -07:00
  • b4ac2b9c0c [Fix] Fix dual chunk model default behavior (#9032) DarkSharpness 2025-08-11 23:50:23 -07:00
  • 83262dcb29 Fix mismatch between padded_scales shape and reshape dimensions in modelopt quantization (#8766) Jianwei Dong 2025-08-12 14:44:40 +08:00
  • c46c75f8c0 feat: add fused moe config for Qwen3-30B-A3B on B200 (#9087) zixuanzhang226 2025-08-11 23:25:36 -07:00
  • 2aaf22c46c Optimization for AscendPagedTokenToKVPoolAllocator (#8293) Makcum888e 2025-08-12 09:06:39 +03:00
  • 29a610b4d9 Fix broken CI TestRequestLengthValidation (#9095) Lifu Huang 2025-08-11 22:59:56 -07:00
  • 5ded39cab2 Fix race condition in async lora unload (#9084) Lifu Huang 2025-08-11 22:59:29 -07:00
  • 4093d460ce [CI] migrate router to BM.A10.4 runner (#8992) Keyang Ru 2025-08-11 22:41:18 -07:00
  • 9d68bdb240 [router] Add Rust Binary Entrypoint for SGLang Router (#9089) Simo Lin 2025-08-11 21:37:36 -07:00
  • a218490136 (gpt-oss, oai, chat): Remove Harmony Integration and Implement Native GPT-OSS Tool Call Support (#9043) Chang Su 2025-08-11 18:59:18 -07:00
  • 0eec4cb6cc HiCache, add bench long context plus minor fixs (#9086) Zhiqiang Xie 2025-08-11 16:54:52 -07:00
  • ff1f68252c [fix] Set Radix tree root node hash to None - Nvidia Dynamo Integration (#9030) Faradawn Yang 2025-08-11 14:20:39 -07:00
  • 9f78f391ae HiCache Storage: generate hash when inserting new nodes (#9053) Zhiqiang Xie 2025-08-11 14:18:59 -07:00
  • f508cd3cb7 TRTLLM-MLA FP8 path (#8638) Faraz 2025-08-11 17:02:13 -04:00
  • 44e86480e8 fuse allreduce and residual_rmsnorm (#8731) Xiaoyu Zhang 2025-08-12 04:50:53 +08:00
  • 8c07fabda7 Update hyperparameter_tuning.md (#9083) Lianmin Zheng 2025-08-11 13:44:11 -07:00
  • 90f44b74e6 fix: w4afp8 accuracy problem and rebase (#8752) SijiaYang 2025-08-12 04:41:19 +08:00
  • 38907fe639 refactor(pd-router): extract common patterns to reduce code duplication (#9081) Simo Lin 2025-08-11 13:32:31 -07:00
  • f9afa7dceb Fix docs for clip max new tokens (#9082) Liangsheng Yin 2025-08-11 13:15:21 -07:00
  • 0d9e89ec69 [PD]decode: add CLIP_MAX_NEW_TOKEN for pop_preallocated (#8866) Jimmy 2025-08-12 04:08:11 +08:00
  • 3d64fda376 Fix broken Kimi models HuggingFace link (#9080) Hangzhi 2025-08-11 12:15:00 -07:00
  • 3bffe11279 Fix chunked prefill size validation for disabled state (#8973) 633WHU 2025-08-12 02:05:29 +08:00
  • 44426e54be Update REVIEWERS (#9063) HAI 2025-08-11 11:04:39 -07:00
  • 9f24dfefd1 chore(gb200): remove ToT flashinfer installation (#9079) ishandhanani 2025-08-11 11:02:15 -07:00
  • 89f1d4f536 update deepep commit to support qwen3-coder (#9066) Yi Zhang 2025-08-12 01:42:33 +08:00
  • 75e6a7cde1 Support radix cache for Lora feature (#7216) Baizhou Zhang 2025-08-11 10:14:11 -07:00
  • 6f81a710f7 [pd-router] add retry and circuit breakfor for pd router (#9051) Simo Lin 2025-08-11 05:53:26 -07:00
  • a6452b7188 bugfix: Fix output_ids extraction in detokenizer_manager (#9047) Chang Su 2025-08-11 03:17:32 -07:00
  • f4ae50e97c fix: use flashinfer v0.2.11.post1 zhyncs 2025-08-11 02:49:25 -07:00
  • 84cb449eec Revert "chore: upgrade flashinfer 0.2.11 (#9036)" (#9057) Yineng Zhang 2025-08-11 00:16:39 -07:00
  • f003cd3548 [CI] Fix CI tests (#9050) Cheng Wan 2025-08-10 23:52:05 -07:00
  • 9d834fdcc1 Revert "feat: update flashinfer ar oneshot params (#8687)" (#9054) Yineng Zhang 2025-08-10 23:24:42 -07:00
  • b32792516a REVIEWERS.md typo fix (#9048) Zhiqiang Xie 2025-08-10 22:33:37 -07:00
  • 067068f271 [router] regular router circuit breaker (#8997) Simo Lin 2025-08-10 21:19:30 -07:00
  • 6beeff41c5 Update REVIEWERS.md (#9046) Lianmin Zheng 2025-08-10 21:11:14 -07:00
  • 2e8e7e353b Improve docs and developer guide (#9044) Lianmin Zheng 2025-08-10 21:05:18 -07:00
  • 2449a0afe2 Refactor the docs (#9031) Lianmin Zheng 2025-08-10 19:49:45 -07:00
  • 0f229c07f1 Update release-docs.yml (#9037) Lianmin Zheng 2025-08-10 18:52:11 -07:00
  • dd001a5477 chore: upgrade flashinfer 0.2.11 (#9036) Yineng Zhang 2025-08-10 17:35:37 -07:00
  • 4ea9d74a3e Simplify health check (#9034) Lianmin Zheng 2025-08-10 17:35:05 -07:00
  • dd949ace23 Revert "[1/2][resubmit] sgl-kernel: Fuse routed scaling factor into m… (#9035) Yineng Zhang 2025-08-10 17:34:54 -07:00
  • f2887498f0 Simplify memory pool (#9033) Lianmin Zheng 2025-08-10 17:32:28 -07:00
  • 8ecf6b9d24 Support Flatten Tensor Update Weights to speed up MOE Update Weights by 20% (#8079) Stefan He 2025-08-10 16:08:59 -07:00
  • 0418b9d4ea [Optimization] Update estimated_num_new_pages logic in TokenToKVPoolAllocator (#8794) YiXR 2025-08-11 07:01:51 +08:00
  • e322a94d1f Reduce CI duration of test_lora_update. (#9024) Lifu Huang 2025-08-10 15:34:04 -07:00
  • 2c7f01bc89 Reorganize CI and test files (#9027) Lianmin Zheng 2025-08-10 12:30:06 -07:00
  • b58ae7a2a0 Simplify frontend language (#9029) Lianmin Zheng 2025-08-10 10:59:30 -07:00
  • 6345069f6c [RL] Add test for /abort_request (#7626) Stefan He 2025-08-10 09:14:19 -07:00
  • ce9cf35327 [router] update pyo3 version to 0.25.1 (#9022) Simo Lin 2025-08-10 06:45:51 -07:00
  • f8a173bb50 Improve LoRA Perf by Deprecating FlashInfer and Eliminating Redundant Tensor Ops (#8940) Lifu Huang 2025-08-10 01:04:45 -07:00
  • 6b847a9a05 Optimize: Cache CUDA device to reduce redundant calls during tensor l… (#8996) JiLi 2025-08-10 15:32:57 +08:00
  • 473400e452 [router] upgrade kube version to latest (#9018) Simo Lin 2025-08-09 22:49:45 -07:00
  • dd665f967f [router] upgrade rand to latest version (#9017) Simo Lin 2025-08-09 22:49:30 -07:00
  • 3817a37d87 [router] upgrade to latest sgl kernel for router ci (#9019) Simo Lin 2025-08-09 21:49:18 -07:00
  • 7ba5ad5766 [Fix] Fix flashinfer cpu <-> gpu synchronization (#8340) DarkSharpness 2025-08-09 20:11:40 -07:00
  • 19bc77f05c [Fix] Fix hicache backend (#8991) DarkSharpness 2025-08-09 17:16:25 -07:00
  • 86497d99f2 fix page first per layer pf2lf kernel (#8915) huangtingwei 2025-08-10 08:16:11 +08:00
  • 5c31b35db2 [hicache] Optimization for DMA copy (#8245) cctry 2025-08-09 17:16:07 -07:00
  • ef48d5547e Fix CI (#9013) Lianmin Zheng 2025-08-09 16:00:10 -07:00
  • a886564a18 fix flashinfer allreduce fusion import bug (#9007) Xiaoyu Zhang 2025-08-10 04:47:05 +08:00
  • 9a44b643c6 Fix CI (#9012) Lianmin Zheng 2025-08-09 13:33:42 -07:00
  • 41d71ca488 fix: fix obsolete qwen-audio processor arg (#9003) Mick 2025-08-10 04:18:36 +08:00
  • 20cfc5a251 [perf] add kimi-k2 b200 fused moe config (#9010) JieXin Liang 2025-08-10 03:40:49 +08:00
  • 48b8b4c124 fix nvshmem cu126 (#9001) Yineng Zhang 2025-08-09 03:34:54 -07:00
  • 323bc2f51a Enable TBO on ROCm (#8329) Chaitanya Sri Krishna Lolla 2025-08-09 14:29:55 +05:30
  • 137e75daa1 [Feature] Optimize DeepSeek's DeepEP on Ascend NPU (#8355) Even Zhou 2025-08-09 16:35:00 +08:00
  • 52e1f52f32 [bugfix] Fix missing args in bench one batch (#8877) Trevor Morris 2025-08-09 01:34:03 -07:00
  • 5018809222 [DP] fix: engine crash when decode batch is padded (#8995) Cheng Wan 2025-08-09 01:29:29 -07:00
  • 326a901df4 chore: upgrade sgl-kernel 0.3.3 (#8998) Yineng Zhang 2025-08-09 01:22:01 -07:00
  • 6e0b646832 HiCache Storage tp fix (#8878) Zhiqiang Xie 2025-08-09 01:16:51 -07:00
  • 4a9f3eef90 Tiny Llama4 type error in constructor (#6752) Brayden Zhong 2025-08-09 04:03:59 -04:00
  • 1b7afad0dd feature(hicache): Support hf3fs-hicache reusing kvcache across different instances (#8673) hzh0425 2025-08-09 16:03:00 +08:00
  • f29aba8c6e Support glm4.1v and glm4.5v (#8798) Binyao Jiang 2025-08-09 00:59:13 -07:00
  • faa25df1ae feat: update flashinfer ar oneshot params (#8687) eigen 2025-08-09 03:51:27 -04:00
  • 7b81f956eb Fix qwen2 audio not working bug (#8600) Binyao Jiang 2025-08-09 00:42:29 -07:00
  • d3e67deb1b Fix redundant kernel in sink dtype conversion (#8966) fzyzcjy 2025-08-09 15:34:49 +08:00