Commit Graph

125 Commits

Author SHA1 Message Date
Lianmin Zheng
05e4787243 [CI] Fix the trigger condition for PR test workflows (#9761) 2025-08-30 15:47:10 -07:00
Lianmin Zheng
2c7f01bc89 Reorganize CI and test files (#9027) 2025-08-10 12:30:06 -07:00
Lianmin Zheng
ef48d5547e Fix CI (#9013) 2025-08-09 16:00:10 -07:00
Lianmin Zheng
706bd69cc5 Clean up server_args.py to have a dedicated function for model specific adjustments (#8983) 2025-08-08 19:56:50 -07:00
fzyzcjy
b114a8105b Support B200 in CI (#8861) 2025-08-06 21:42:44 +08:00
Lianmin Zheng
263c9236a0 Always trigger pr-test (#8527) 2025-07-29 04:05:19 -07:00
Lianmin Zheng
69712e6f55 Rename the last step in pr-test.yml as pr-test-finish (#8486) 2025-07-28 19:06:13 -07:00
Lifu Huang
5c705b1dce Add perf tests for LoRA (#8314) 2025-07-26 14:55:22 -07:00
Lianmin Zheng
9c7a46180c [Doc] Steps to add a new attention backend (#8155) 2025-07-18 16:38:26 -07:00
Cheng Wan
02404a1e35 [ci] recover 8-gpu deepep test (#8105) 2025-07-17 00:46:40 -07:00
Cheng Wan
475a249bb8 temporarily disable deepep-8-gpu and activate two small tests (#7961) 2025-07-11 14:22:05 -07:00
Cheng Wan
d487555f84 [CI] Add deepep tests to CI (#7872) 2025-07-09 01:49:47 -07:00
Lianmin Zheng
55e03b10c4 Fix a bug in BatchTokenIDOut & Misc style and dependency updates (#7457) 2025-06-23 06:20:39 -07:00
Junrong Lin
2103b80607 [CI] update verlengine ci to 4-gpu test (#6007) 2025-05-27 14:32:23 -07:00
Shenggui Li
3f23d8cdf1 added support for tied weights in qwen pipeline parallelism (#6546) 2025-05-25 00:00:56 -07:00
fzyzcjy
f11481b921 Add 4-GPU runner tests and split existing tests (#6383) 2025-05-18 11:56:51 -07:00
Ying Sheng
bad7c26fdc [PP] Fix init_memory_pool desync & add PP for mixtral (#6223) 2025-05-12 12:38:09 -07:00
Lianmin Zheng
03227c5fa6 [CI] Reorganize the 8 gpu tests (#6192) 2025-05-11 10:55:06 -07:00
Lianmin Zheng
17c36c5511 [CI] Disabled deepep tests temporarily because it takes too much time. (#6186) 2025-05-10 23:40:50 -07:00
shangmingc
31d1f6e7f4 [PD] Add simple unit test for disaggregation feature (#5654)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-11 13:35:27 +08:00
Lianmin Zheng
de167cf5fa Fix request abortion (#6184) 2025-05-10 21:54:46 -07:00
XinyuanTong
e88dd482ed [CI]Add performance CI for VLM (#6038)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-07 19:20:03 -07:00
Cheng Wan
9bddf1c82d Deferring 8 GPU test (#6102) 2025-05-07 18:49:58 -07:00
Lianmin Zheng
38053c3372 Fix the timeout for 8 gpu tests (#6084) 2025-05-07 03:13:12 -07:00
Jinyan Chen
8a828666a3 Add DeepEP to CI PR Test (#5655)
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
2025-05-06 17:36:03 -07:00
Baizhou Zhang
799789afed Bump Flashinfer to 0.2.5 (#5870)
Co-authored-by: Yuhao Chen <yxckeis8@gmail.com>
2025-04-29 19:50:57 -07:00
Lianmin Zheng
849c83a0c0 [CI] test chunked prefill more (#5798) 2025-04-28 10:57:17 -07:00
Lianmin Zheng
daed453e84 [CI] Improve github summary & enable fa3 for more models (#5796) 2025-04-27 15:29:46 -07:00
Baizhou Zhang
f9fb33efc3 Add 8-GPU Test for Deepseek-V3 (#5691)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-04-27 12:46:12 -07:00
Lianmin Zheng
35ca04d2fa [CI] fix port conflicts (#5789) 2025-04-27 05:17:44 -07:00
Stefan He
408ba02218 Add Llama 4 to FA3 test (#5509) 2025-04-26 19:49:31 -07:00
Yineng Zhang
0961feefca feat: use flashinfer jit package (#5547) 2025-04-19 00:28:39 -07:00
Lianmin Zheng
838fa0f218 [minor] cleanup cmakelists.txt (#5420) 2025-04-15 07:07:07 -07:00
Yineng Zhang
3289c1207d Update the retry count (#5051) 2025-04-03 17:07:38 -07:00
Lianmin Zheng
f842853a40 Fix the timeout for unit-test-2-gpu in pr-test.yml (#4927) 2025-03-30 12:15:40 -07:00
Lianmin Zheng
4ede6770cd Fix retract for page size > 1 (#4914) 2025-03-30 02:57:15 -07:00
Lianmin Zheng
74e0ac1dbd Clean up import vllm in quantization/__init__.py (#4834) 2025-03-28 10:34:10 -07:00
fzyzcjy
0d3e3072ee Fix CI of test_patch_torch (#4844) 2025-03-27 21:22:45 -07:00
fzyzcjy
e45ae444db Revert "Add DeepEP tests into CI (#4737)" (#4751) 2025-03-25 00:44:01 -07:00
fzyzcjy
64129fa632 Add DeepEP tests into CI (#4737) 2025-03-24 19:54:31 -07:00
aoshen524
588865f0e0 [Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274)
Co-authored-by: ShenAo1111 <1377693092@qq.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-03-18 20:33:07 -07:00
Yineng Zhang
c787298547 use sgl custom all reduce (#4441) 2025-03-18 00:46:41 -07:00
Lianmin Zheng
5493c3343e Fix data parallel + tensor parallel (#4499) 2025-03-17 05:13:16 -07:00
Lianmin Zheng
06d12b39d3 Remove filter for pr-tests (#4468) 2025-03-16 00:57:26 -07:00
Lianmin Zheng
c30976fb41 Fix finish step for pr tests and notebook tests (#4467) 2025-03-16 00:52:06 -07:00
Yineng Zhang
ad1ae7f7cd use topk_softmax with sgl-kernel (#4439) 2025-03-14 15:59:06 -07:00
Lianmin Zheng
a5a892ffd3 Fix auto merge & add back get_flat_data_by_layer (#4393) 2025-03-13 08:46:25 -07:00
Lianmin Zheng
5a6400eec5 Test no vllm custom allreduce (#4256) 2025-03-10 10:08:25 -07:00
Lianmin Zheng
aa957102a9 Simplify tests & Fix trtllm custom allreduce registration (#4252) 2025-03-10 01:24:22 -07:00
Lianmin Zheng
fbd560028a Auto balance CI tests (#4238) 2025-03-09 21:05:55 -07:00