Commit Graph

138 Commits

Author SHA1 Message Date
DiweiSun
029e0af31d ci: enhance xeon ci (#9395) 2025-08-21 03:35:17 -07:00
Hank Han
81da16f6d3 [CI] add deepseek w4a8 test on h20 ci (#7758) 2025-08-16 01:54:13 -07:00
Minglei Zhu
6ee6619b7a add zai-org/GLM-4.5-Air-FP8 model into nightly CI (#8894) 2025-08-08 01:44:19 -07:00
Lianmin Zheng
e314b084c5 [FIX] Fix the nightly CI by disabling swa mem pool for gemma2 (#8693) 2025-08-02 18:43:14 -07:00
harrisonlimh
747dd45077 feat: throttle requests at scheduler based on --max_queued_requests (#7565) 2025-07-28 22:32:33 +08:00
fzyzcjy
62222bd27e Minor tool for comparison of benchmark results (#7974) 2025-07-27 00:27:50 -07:00
Lifu Huang
5c705b1dce Add perf tests for LoRA (#8314) 2025-07-26 14:55:22 -07:00
Hank Han
2117f82def [ci] CI supports use cached models (#7874) 2025-07-14 11:42:21 +00:00
YanbingJiang
4de0395343 Add V2-lite model test (#7390)
Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>
2025-07-03 22:25:50 -07:00
Stefan He
3774f07825 Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (#7099) 2025-06-19 00:56:37 -07:00
woodx
e30ef368ab Feat/support rerank (#6058) 2025-06-16 10:50:01 -07:00
Lianmin Zheng
f47a1b1d0f Increase timeout in test/srt/test_disaggregation.py (#7175) 2025-06-13 23:12:14 -07:00
Yineng Zhang
56ccd3c22c chore: upgrade flashinfer v0.2.6.post1 jit (#6958)
Co-authored-by: alcanderian <alcanderian@gmail.com>
Co-authored-by: Qiaolin Yu <qy254@cornell.edu>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
2025-06-09 09:22:39 -07:00
Sai Enduri
77e928d00e Update server timeout time in AMD CI. (#6953) 2025-06-07 15:10:27 -07:00
Zaili Wang
562f279a2d [CPU] enable CI for PRs, add Dockerfile and auto build task (#6458)
Co-authored-by: diwei sun <diwei.sun@intel.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-06-05 13:43:54 -07:00
Lianmin Zheng
20fd53b8f6 Correctly abort the failed grammar requests & Improve the handling of abort (#6803) 2025-06-01 19:00:07 -07:00
Lianmin Zheng
2d72fc47cf Improve profiler and integrate profiler in bench_one_batch_server (#6787) 2025-05-31 15:53:55 -07:00
Byron Hsu
8233cc10fd [PD] Support logprob & Add failure test (#6558) 2025-05-23 14:29:20 -07:00
Lifu Huang
3cf1473a09 Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-17 16:49:18 -07:00
Lianmin Zheng
fba8eccd7e Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-12 00:17:33 -07:00
Lifu Huang
6e2da51561 Replace time.time() to time.perf_counter() for benchmarking. (#6178)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-11 14:32:49 -07:00
shangmingc
31d1f6e7f4 [PD] Add simple unit test for disaggregation feature (#5654)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-05-11 13:35:27 +08:00
XinyuanTong
e88dd482ed [CI]Add performance CI for VLM (#6038)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-07 19:20:03 -07:00
Jinyan Chen
8a828666a3 Add DeepEP to CI PR Test (#5655)
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
2025-05-06 17:36:03 -07:00
mlmz
256c4c2519 fix: correct stream response when enable_thinking is set to false (#5881) 2025-04-30 19:44:37 -07:00
Ying Sheng
11383cec3c [PP] Add pipeline parallelism (#5724) 2025-04-30 18:18:07 -07:00
Lianmin Zheng
849c83a0c0 [CI] test chunked prefill more (#5798) 2025-04-28 10:57:17 -07:00
Lianmin Zheng
a38f6932cc [CI] Fix test case (#5790) 2025-04-27 08:55:35 -07:00
Lianmin Zheng
621e96bf9b [CI] Fix ci tests (#5769) 2025-04-27 07:18:10 -07:00
Lianmin Zheng
35ca04d2fa [CI] fix port conflicts (#5789) 2025-04-27 05:17:44 -07:00
Stefan He
408ba02218 Add Llama 4 to FA3 test (#5509) 2025-04-26 19:49:31 -07:00
fzyzcjy
453d412cdb Tiny update error hint (#5037) 2025-04-21 00:47:47 -07:00
tianlian yi
bc92107b03 Support server based rollout in Verlengine (#4848)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>
2025-04-12 10:07:52 -07:00
saienduri
7f875f1293 update grok test (#5171) 2025-04-09 11:09:47 -07:00
Yun Dai
2695ab0537 Fix loading KV quantization scale; Enable modelopt kv cache (#4686)
Co-authored-by: qingquansong <ustcsqq@gmail.com>
2025-04-08 09:11:35 -07:00
Yineng Zhang
3289c1207d Update the retry count (#5051) 2025-04-03 17:07:38 -07:00
Lianmin Zheng
4ede6770cd Fix retract for page size > 1 (#4914) 2025-03-30 02:57:15 -07:00
Lianmin Zheng
b26bc86b36 Support page size > 1 + eagle (#4908) 2025-03-30 00:46:23 -07:00
fzyzcjy
8690c40bb0 Improve stack trace of retry errors (#4845) 2025-03-29 08:21:31 -07:00
Lianmin Zheng
47e6628aae Fix CI tests (#4853) 2025-03-28 00:28:35 -07:00
fzyzcjy
fa3c9e0668 Fix popen_launch_server wait for 20 minutes when child process exits (#4777) 2025-03-26 00:32:19 -07:00
fzyzcjy
26f07294f1 Warn users when release_memory_occupation is called without memory saver enabled (#4566) 2025-03-26 00:18:14 -07:00
fzyzcjy
15ddd84322 Add retry for flaky tests in CI (#4755) 2025-03-25 16:53:12 -07:00
Yun Dai
8cd4250401 [quantization] fix channelwise conversion with scalar weight scale (#4596) 2025-03-22 00:47:52 -07:00
Byron Hsu
8cc300f536 Fix router test (#4483) 2025-03-16 22:49:47 -07:00
HandH1998
2ac189edc8 Amd test fp8 (#4261) 2025-03-10 10:12:09 -07:00
Lianmin Zheng
00d25a7f5e Fix quantization and nightly tests (#4258) 2025-03-10 03:06:21 -07:00
Lianmin Zheng
fbd560028a Auto balance CI tests (#4238) 2025-03-09 21:05:55 -07:00
Mick
583d6af71b example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-04 22:18:26 -08:00
Lianmin Zheng
77a3954bf7 Simplify eagle tests and TP sync in grammar backend (#4066) 2025-03-04 13:40:40 -08:00