Commit Graph

158 Commits

Author SHA1 Message Date
Simo Lin
c8f31042a8 [router] Refactor router and policy traits with dependency injection (#7987)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: Keru Yang <rukeyang@gmail.com>
Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com>
Co-authored-by: Philip Zhu <phlipzhux@gmail.com>
2025-07-18 14:24:24 -07:00
Cheng Wan
02404a1e35 [ci] recover 8-gpu deepep test (#8105) 2025-07-17 00:46:40 -07:00
Sai Enduri
f06bd210c0 Update amd docker image. (#8045)
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
2025-07-15 15:09:56 -07:00
Hank Han
2117f82def [ci] CI supports use cached models (#7874) 2025-07-14 11:42:21 +00:00
Cheng Wan
d487555f84 [CI] Add deepep tests to CI (#7872) 2025-07-09 01:49:47 -07:00
Kay Yan
975a5ec69c [fix] update bench_speculative.py for compatibility (#7764)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2025-07-04 16:32:54 +08:00
Lianmin Zheng
22352d47a9 Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632)
Co-authored-by: Kan Wu <wukanustc@gmail.com>
2025-06-29 23:16:19 -07:00
Hubert Lu
3b3f1e3aeb [AMD] Add unit-test-sgl-kernel-amd to AMD CI (#7539) 2025-06-29 15:50:09 -07:00
Keyang Ru
29bd4c8135 [CI] Add CI Testing for Prefill-Decode Disaggregation with Router (#7540) 2025-06-27 00:18:56 -07:00
Mick
4d67025a1d chore: improve ci bug reporting (#7542) 2025-06-26 01:32:44 -07:00
Shangming Cai
a07f8ae4b7 [CI] Upgrade mooncake to v0.3.4.post2 to fix potential slice failed bug (#7522)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-06-25 01:49:22 -07:00
Shangming Cai
d6dddc19ff [CI] Upgrade mooncake to 0.3.4.post1 to fix 8 gpu tests (#7472)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-06-24 02:10:50 +08:00
kk
bd4f581896 Fix torch compile run (#7391)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
2025-06-22 15:33:09 -07:00
Shangming Cai
187b85b7f3 [PD] Optimize custom mem pool usage and bump mooncake version (#7393)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-06-20 09:50:39 -07:00
Lianmin Zheng
0f218731e3 Do not run frontend_reasoning.ipynb to reduce the CI load (#7073) 2025-06-10 17:15:31 -07:00
Yineng Zhang
56ccd3c22c chore: upgrade flashinfer v0.2.6.post1 jit (#6958)
Co-authored-by: alcanderian <alcanderian@gmail.com>
Co-authored-by: Qiaolin Yu <qy254@cornell.edu>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
2025-06-09 09:22:39 -07:00
Hubert Lu
4740288303 [AMD] Add more tests to per-commit-amd (#6926) 2025-06-08 01:08:37 -07:00
HAI
b819381fec AITER backend extension and workload optimizations (#6838)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-06-05 23:00:18 -07:00
Lianmin Zheng
20fd53b8f6 Correctly abort the failed grammar requests & Improve the handling of abort (#6803) 2025-06-01 19:00:07 -07:00
Sai Enduri
f4a8987f69 Update amd docker and nightly models. (#6687) 2025-05-28 00:08:08 -07:00
Yineng Zhang
f77da69964 chore: upgrade mooncake-transfer-engine (#6643) 2025-05-26 20:01:30 -07:00
Sai Enduri
eb8f02dd87 Update nightly thresholds and dependencies. (#6635) 2025-05-26 11:44:13 -07:00
fzyzcjy
25be63d0b2 Auto handle PD disaggregation in bench_serving (#6587)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-25 22:41:27 -07:00
fzyzcjy
d502dae0f0 Tiny change killall_sglang.sh (#6596) 2025-05-25 22:36:51 -07:00
kk
7a5e6ce1cb Fix GPU OOM (#6564)
Co-authored-by: michael <michael.zhang@amd.com>
2025-05-24 16:38:39 -07:00
Byron Hsu
2d831c6ef9 [PD] Support structured output (#6560) 2025-05-23 21:49:00 -07:00
Byron Hsu
8233cc10fd [PD] Support logprob & Add failure test (#6558) 2025-05-23 14:29:20 -07:00
HAI
5c0b38f369 aiter attention-backend (default enabled on AMD/ROCm) (#6381) 2025-05-20 22:52:41 -07:00
Yineng Zhang
eabcf82acb feat: add long context example (#6391) 2025-05-18 01:45:17 -07:00
Sai Enduri
c47a51db7e Clean up AMD CI (#6365) 2025-05-18 01:17:28 -07:00
Lianmin Zheng
dcc0a45618 Fix amd ci (#6360) 2025-05-16 15:33:10 -07:00
Lianmin Zheng
e07a6977e7 Minor improvements of TokenizerManager / health check (#6327) 2025-05-15 15:29:25 -07:00
Stefan He
1ab14c4c5c [VERL Use Case] Add torch_memory_saver into deps (#6247) 2025-05-12 19:09:03 -07:00
Yineng Zhang
f94543d22b chore: add hf_xet dep (#6243) 2025-05-12 13:08:40 -07:00
shangmingc
0f334945c6 [CI] Fix PD mooncake dependency error (#6212)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-12 10:08:49 -07:00
Lianmin Zheng
03227c5fa6 [CI] Reorganize the 8 gpu tests (#6192) 2025-05-11 10:55:06 -07:00
Yineng Zhang
230106304d chore: upgrade sgl-kernel v0.1.2.post1 (#6196)
Co-authored-by: alcanderian <alcanderian@gmail.com>
2025-05-11 22:41:37 +08:00
shangmingc
31d1f6e7f4 [PD] Add simple unit test for disaggregation feature (#5654)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-11 13:35:27 +08:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
Jinyan Chen
8a828666a3 Add DeepEP to CI PR Test (#5655)
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
2025-05-06 17:36:03 -07:00
Huapeng Zhou
b8559764f6 [Test] Add flashmla attention backend test (#5587) 2025-05-05 10:32:02 -07:00
Yineng Zhang
9a6ad8916d chore: upgrade sgl-kernel 0.1.1 (#5933) 2025-04-30 16:13:30 -07:00
Yineng Zhang
41ac0c6d48 chore: upgrade sgl-kernel 0.1.0 (#5690) 2025-04-27 21:00:50 -07:00
Lianmin Zheng
3dd3538c18 Pin torch audio to 2.6.0 (#5750) 2025-04-25 15:06:28 -07:00
Ravi Theja
7d9679b74d Add MMMU benchmark results (#4491)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
2025-04-25 15:23:53 +08:00
Yineng Zhang
7282ab741a fix: update bench_speculative (#5649) 2025-04-22 16:08:15 -07:00
Byron Hsu
bf98d2e377 [PD] Support prefill overlap + Ensure no race condition (#5609) 2025-04-21 12:12:56 -07:00
Byron Hsu
deded17f38 [PD] Fix edge case and simplify large page size + chunked prefill (#5589) 2025-04-21 10:27:02 -07:00
Byron Hsu
c951d312ed [PD] Fix large page size + chunk prefill (#5588) 2025-04-20 17:21:54 -07:00
lukec
417b44eba8 [Feat] upgrade pytorch2.6 (#5417) 2025-04-20 16:06:34 -07:00