Commit Graph

594 Commits

Author SHA1 Message Date
XinyuanTong
e88dd482ed [CI]Add performance CI for VLM (#6038)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-07 19:20:03 -07:00
Cheng Wan
9bddf1c82d Deferring 8 GPU test (#6102) 2025-05-07 18:49:58 -07:00
Stefan He
24c13ca950 Clean up fa3 test from 8 gpus (#6105) 2025-05-07 18:38:40 -07:00
Jinyan Chen
8a828666a3 Add DeepEP to CI PR Test (#5655)
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
2025-05-06 17:36:03 -07:00
Baizhou Zhang
bdd17998e6 [Fix] Fix and rename flashmla CI test (#6045) 2025-05-06 13:25:15 -07:00
Huapeng Zhou
b8559764f6 [Test] Add flashmla attention backend test (#5587) 2025-05-05 10:32:02 -07:00
Qiaolin Yu
3042f1da61 Fix flaky issues of lora and add multi batch tests (#5957) 2025-05-04 13:11:40 -07:00
xm:D
3409aaab32 Support InternVL3 (#5350)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-05-01 22:38:59 -07:00
Sai Enduri
73bc1d00fc Add 1 gpu perf and 2 gpu accuracy tests for AMD MI300x CI. (#5960) 2025-05-01 20:56:59 -07:00
KCFindstr
d33955d28a Properly return error response in vertex_generate HTTP endpoint (#5956) 2025-05-01 11:48:58 -07:00
Ke Bao
ebaba85655 Update ci test and doc for MTP api change (#5952) 2025-05-01 09:30:27 -07:00
mlmz
256c4c2519 fix: correct stream response when enable_thinking is set to false (#5881) 2025-04-30 19:44:37 -07:00
Qiaolin Yu
7bcd8b1cb2 Fix lora batch processing when input lora_path contains None (#5930) 2025-04-30 19:42:42 -07:00
Ying Sheng
11383cec3c [PP] Add pipeline parallelism (#5724) 2025-04-30 18:18:07 -07:00
Sai Enduri
2afba1b1c1 Add TP2 MOE benchmarks for AMD. (#5909) 2025-04-30 11:38:20 -07:00
JieXin Liang
3cff963335 [fix] kimi-vl test in test_vision_openai_server.py (#5910) 2025-04-29 23:59:10 -07:00
liwenju0
8fefdd32c7 [Feature] add support kimi vl model (#5383)
Co-authored-by: wenju.li <wenju.li@deepctr.cn>
2025-04-29 21:31:19 -07:00
saienduri
e3a5304475 Add AMD MI300x Nightly Testing. (#5861) 2025-04-29 17:34:32 -07:00
Chang Su
2b06484bd1 feat: support pythonic tool call and index in tool call streaming (#5725) 2025-04-29 17:30:44 -07:00
Chang Su
9419e75d60 [CI] Add test_function_calling.py to run_suite.py (#5896) 2025-04-29 15:54:53 -07:00
Qiaolin Yu
8c0cfca87d Feat: support cuda graph for LoRA (#4115)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
2025-04-28 23:30:44 -07:00
woodx
2c3ea29476 [Feature] support auto chat template (#4949) 2025-04-28 22:34:18 -07:00
Lianmin Zheng
26fc32d168 [CI] tune the test order to warmup the server (#5860) 2025-04-28 19:27:37 -07:00
Lianmin Zheng
849c83a0c0 [CI] test chunked prefill more (#5798) 2025-04-28 10:57:17 -07:00
Lianmin Zheng
daed453e84 [CI] Improve github summary & enable fa3 for more models (#5796) 2025-04-27 15:29:46 -07:00
Baizhou Zhang
f9fb33efc3 Add 8-GPU Test for Deepseek-V3 (#5691)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-04-27 12:46:12 -07:00
Lianmin Zheng
a38f6932cc [CI] Fix test case (#5790) 2025-04-27 08:55:35 -07:00
Lianmin Zheng
621e96bf9b [CI] Fix ci tests (#5769) 2025-04-27 07:18:10 -07:00
Lianmin Zheng
35ca04d2fa [CI] fix port conflicts (#5789) 2025-04-27 05:17:44 -07:00
Lianmin Zheng
3c4e0ee64d [CI] Tune threshold (#5787) 2025-04-27 04:10:22 -07:00
Lianmin Zheng
4d23ba08f5 Simplify FA3 tests (#5779) 2025-04-27 01:30:17 -07:00
Baizhou Zhang
a45a4b239d Split local attention test from fa3 test (#5774) 2025-04-27 01:03:31 -07:00
Lianmin Zheng
981a2619d5 Fix eagle test case (#5776) 2025-04-27 01:00:54 -07:00
Michał Moskal
bdbe5f816b update llguidance to 0.7.11; adds StructTag (#4870) 2025-04-26 20:13:57 -07:00
Stefan He
408ba02218 Add Llama 4 to FA3 test (#5509) 2025-04-26 19:49:31 -07:00
DavidBao
d8fbc7c096 [feature] support for roberta embedding models (#5730) 2025-04-26 18:47:06 -07:00
Mick
02723e1b0d CI: rewrite test_vision_chunked_prefill to speedup (#5682) 2025-04-26 18:33:13 -07:00
ZXN
04d0123fd9 [Fix]: support deepseek-vl2-tiny model (#5552)
Co-authored-by: bppps <zouyu.zzx@alibaba-inc.com>
2025-04-26 17:52:53 +08:00
Mick
feda9b11b3 fix: fix one more bug from merging mm_inputs (#5718)
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com>
2025-04-25 17:28:33 -07:00
Lianmin Zheng
21514ff5bd Disable flaky eagle tests (#5753) 2025-04-25 15:54:39 -07:00
Ravi Theja
7d9679b74d Add MMMU benchmark results (#4491)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
2025-04-25 15:23:53 +08:00
Mick
c998d04b46 vlm: enable radix cache for qwen-vl models (#5349)
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
2025-04-23 20:35:05 -07:00
Lianmin Zheng
de071366cd tune the threshold of gemma-2-27b-it in test_nightly_gsm8k_eval.py (#5677) 2025-04-23 05:31:17 -07:00
Yineng Zhang
8777a1d24b fix gemma3 unit test (#5670) 2025-04-23 02:14:01 -07:00
Zhiqiang Xie
70645f4d7d upstream hicache fixes (#5570) 2025-04-20 23:08:30 -07:00
Qingquan Song
188f0955fa Add Speculative Decoding Eagle3 topk > 1 (#5318)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
Co-authored-by: Yubo Wang <yubowang2019@gmail.com>
2025-04-20 22:58:28 -07:00
kyle-pena-kuzco
9f3bd2ad39 Feat: Implement JSON Mode (response_format.type="json_object") (#4733)
Co-authored-by: Kyle Pena <kylepena@kyles-macbook-pro.turkey-marlin.ts.net>
2025-04-20 17:41:22 -07:00
Adarsh Shirawalmath
8b39274e34 [Feature] Prefill assistant response - add continue_final_message parameter (#4226)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-04-20 17:37:18 -07:00
Baizhou Zhang
5156d5a413 Add test config yamls for Deepseek v3 (#5433) 2025-04-20 17:28:52 -07:00
Xiaoyu Zhang
bf86c5e990 restruct compressed_tensors_w8a8_fp8 (#5475) 2025-04-19 04:52:15 -07:00