Commit Graph

54 Commits

Author SHA1 Message Date
Lianmin Zheng
05e4787243 [CI] Fix the trigger condition for PR test workflows (#9761) 2025-08-30 15:47:10 -07:00
Hubert Lu
711390a971 [AMD] Support Hierarchical Caching on AMD GPUs (#8236) 2025-08-28 15:27:07 -07:00
Hubert Lu
c6c379ab31 [AMD] Reorganize hip-related header files in sgl-kernel (#9320) 2025-08-18 16:53:44 -07:00
Sai Enduri
740f063035 Fix Custom All Reduce CI job. (#9258) 2025-08-16 16:29:43 -07:00
kk
983aa4967b Fix nan value generated after custom all reduce (#8663)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-08-15 12:33:54 -07:00
Hubert Lu
9c3e95d98b [AMD] Expand test coverage for AMD CI and enable apply_token_bitmask_inplace_cuda in sgl-kernel (#8268) 2025-08-15 12:32:51 -07:00
Lianmin Zheng
2c7f01bc89 Reorganize CI and test files (#9027) 2025-08-10 12:30:06 -07:00
Lianmin Zheng
67a7d1f699 Create cancel-all-pr-test-runs (#8986) 2025-08-08 15:53:51 -07:00
kk
32d9e39a29 Fix potential memory fault issue and ncclSystemError in CI test (#8681)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-08-05 12:19:37 -07:00
Sai Enduri
f06bd210c0 Update amd docker image. (#8045)
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
2025-07-15 15:09:56 -07:00
Hubert Lu
3b3f1e3aeb [AMD] Add unit-test-sgl-kernel-amd to AMD CI (#7539) 2025-06-29 15:50:09 -07:00
Sai Enduri
62a7aa2efc Update CI flakes. (#7244) 2025-06-16 15:19:32 -07:00
Sai Enduri
2c18642502 Enable more unit tests for AMD CI. (#6983) 2025-06-08 19:41:55 -07:00
Hubert Lu
4740288303 [AMD] Add more tests to per-commit-amd (#6926) 2025-06-08 01:08:37 -07:00
Sai Enduri
77e928d00e Update server timeout time in AMD CI. (#6953) 2025-06-07 15:10:27 -07:00
HAI
b819381fec AITER backend extension and workload optimizations (#6838)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-06-05 23:00:18 -07:00
kk
7a5e6ce1cb Fix GPU OOM (#6564)
Co-authored-by: michael <michael.zhang@amd.com>
2025-05-24 16:38:39 -07:00
Sai Enduri
24c035f2e3 Temporarily disable MI325x 8 gpu testing. (#6576) 2025-05-24 16:37:22 -07:00
HAI
5c0b38f369 aiter attention-backend (default enabled on AMD/ROCm) (#6381) 2025-05-20 22:52:41 -07:00
Sai Enduri
c47a51db7e Clean up AMD CI (#6365) 2025-05-18 01:17:28 -07:00
Lianmin Zheng
e07a6977e7 Minor improvements of TokenizerManager / health check (#6327) 2025-05-15 15:29:25 -07:00
Sai Enduri
73eb67c087 Enable unit tests for AMD CI. (#6283) 2025-05-14 12:55:36 -07:00
Sai Enduri
0f5cb8cae1 Enable MI325X AMD CI. (#6259) 2025-05-13 01:49:33 -07:00
Sai Enduri
7d3a3d4510 Update AMD CI docker to v0.4.6.post3-rocm630. (#6213) 2025-05-12 00:00:46 -07:00
Sai Enduri
73bc1d00fc Add 1 gpu perf and 2 gpu accuracy tests for AMD MI300x CI. (#5960) 2025-05-01 20:56:59 -07:00
Sai Enduri
2afba1b1c1 Add TP2 MOE benchmarks for AMD. (#5909) 2025-04-30 11:38:20 -07:00
HAI
d364b9b0f2 ROCm: update AITER (#5816) 2025-04-28 11:01:20 -07:00
saienduri
c5e1026f47 Update amd docker image to sglang:v0.4.5.post3-rocm630. (#5697) 2025-04-26 18:46:57 -07:00
Ke Bao
11b23ae97b Remove extra copy in deepseek forward absorb (#5578)
Co-authored-by: saienduri <saimanas.enduri@amd.com>
2025-04-21 19:33:21 -07:00
saienduri
7f875f1293 update grok test (#5171) 2025-04-09 11:09:47 -07:00
saienduri
3033c11a21 Add dummy grok test to amd CI. (#5115) 2025-04-08 07:44:59 +00:00
Yuhong Guo
87fafa0105 Revert PR 4764 & 4813 related to R1 RoPE (#4959) 2025-03-31 20:56:58 -07:00
strgrb
668ecc6c5b Fix ut mla-test-1-gpu-amd (#4813)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
2025-03-27 08:27:51 -07:00
Yineng Zhang
8bf6d7f406 support cmake for sgl-kernel (#4706)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-03-27 01:42:28 -07:00
fzyzcjy
26f07294f1 Warn users when release_memory_occupation is called without memory saver enabled (#4566) 2025-03-26 00:18:14 -07:00
Lianmin Zheng
82dec1f70b Remove redundant type conversion (#4513) 2025-03-17 05:57:35 -07:00
Lianmin Zheng
c30976fb41 Fix finish step for pr tests and notebook tests (#4467) 2025-03-16 00:52:06 -07:00
Yineng Zhang
ad1ae7f7cd use topk_softmax with sgl-kernel (#4439) 2025-03-14 15:59:06 -07:00
Yineng Zhang
977d7cd26a cleanup deps 1/n (#4400)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-03-14 00:00:33 -07:00
HandH1998
2ac189edc8 Amd test fp8 (#4261) 2025-03-10 10:12:09 -07:00
Lianmin Zheng
e8a69e4d0c Clean up fp8 support (#4230) 2025-03-09 21:46:35 -07:00
Lianmin Zheng
48473684cc Split test_mla.py into two files (#4216) 2025-03-08 15:40:49 -08:00
saienduri
e1aaa79ac9 Update amd ci docker image to v0.4.3.post4-rocm630. (#4189) 2025-03-07 13:02:02 -08:00
Lianmin Zheng
d7934cde45 Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00
Yineng Zhang
07ab4d4a2d fix #3654 2025-02-18 15:16:16 +08:00
saienduri
522e18eaeb Update amd docker image. (#3654) 2025-02-17 20:12:55 -08:00
saienduri
7474bed883 Update to latest amd image. (#3597) 2025-02-17 00:29:47 +08:00
Yineng Zhang
4fe92bfca5 fix mla test (#3469) 2025-02-10 21:12:00 +08:00
Yineng Zhang
2b1808cec4 update unit test in AMD CI (#3366) 2025-02-07 17:25:16 +08:00
saienduri
200d3b1608 Add sgl-kernel to MI300 CI paths tested. (#3335)
Co-authored-by: HAI <hixiao@gmail.com>
2025-02-06 00:45:38 -08:00