Lianmin Zheng
|
2c7f01bc89
|
Reorganize CI and test files (#9027)
|
2025-08-10 12:30:06 -07:00 |
|
Lianmin Zheng
|
ef48d5547e
|
Fix CI (#9013)
|
2025-08-09 16:00:10 -07:00 |
|
Lianmin Zheng
|
706bd69cc5
|
Clean up server_args.py to have a dedicated function for model specific adjustments (#8983)
|
2025-08-08 19:56:50 -07:00 |
|
Lifu Huang
|
df90645525
|
Support overlapped lora updates (#8213)
|
2025-07-27 13:00:44 -07:00 |
|
Lifu Huang
|
5c705b1dce
|
Add perf tests for LoRA (#8314)
|
2025-07-26 14:55:22 -07:00 |
|
Yineng Zhang
|
a8c10aeeee
|
fix unit tests (#7618)
|
2025-06-28 00:32:41 -07:00 |
|
Sai Enduri
|
62a7aa2efc
|
Update CI flakes. (#7244)
|
2025-06-16 15:19:32 -07:00 |
|
Lianmin Zheng
|
019851d099
|
Fix eagle on AMD (#7051)
|
2025-06-10 05:22:40 -07:00 |
|
Yineng Zhang
|
56ccd3c22c
|
chore: upgrade flashinfer v0.2.6.post1 jit (#6958)
Co-authored-by: alcanderian <alcanderian@gmail.com>
Co-authored-by: Qiaolin Yu <qy254@cornell.edu>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2025-06-09 09:22:39 -07:00 |
|
Lianmin Zheng
|
2d72fc47cf
|
Improve profiler and integrate profiler in bench_one_batch_server (#6787)
|
2025-05-31 15:53:55 -07:00 |
|
kk
|
7a5e6ce1cb
|
Fix GPU OOM (#6564)
Co-authored-by: michael <michael.zhang@amd.com>
|
2025-05-24 16:38:39 -07:00 |
|
Ying Sheng
|
bad7c26fdc
|
[PP] Fix init_memory_pool desync & add PP for mixtral (#6223)
|
2025-05-12 12:38:09 -07:00 |
|
Lianmin Zheng
|
de167cf5fa
|
Fix request abortion (#6184)
|
2025-05-10 21:54:46 -07:00 |
|
XinyuanTong
|
9d8ec2e67e
|
Fix and Clean up chat-template requirement for VLM (#6114)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-11 00:14:09 +08:00 |
|
fzyzcjy
|
a05bd83a94
|
Change AMD test threshold (#6091)
|
2025-05-08 01:05:52 -07:00 |
|
XinyuanTong
|
e88dd482ed
|
[CI]Add performance CI for VLM (#6038)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-05-07 19:20:03 -07:00 |
|
Sai Enduri
|
73bc1d00fc
|
Add 1 gpu perf and 2 gpu accuracy tests for AMD MI300x CI. (#5960)
|
2025-05-01 20:56:59 -07:00 |
|
Sai Enduri
|
2afba1b1c1
|
Add TP2 MOE benchmarks for AMD. (#5909)
|
2025-04-30 11:38:20 -07:00 |
|
Lianmin Zheng
|
daed453e84
|
[CI] Improve github summary & enable fa3 for more models (#5796)
|
2025-04-27 15:29:46 -07:00 |
|
Lianmin Zheng
|
a38f6932cc
|
[CI] Fix test case (#5790)
|
2025-04-27 08:55:35 -07:00 |
|
Lianmin Zheng
|
621e96bf9b
|
[CI] Fix ci tests (#5769)
|
2025-04-27 07:18:10 -07:00 |
|
fzyzcjy
|
15ddd84322
|
Add retry for flaky tests in CI (#4755)
|
2025-03-25 16:53:12 -07:00 |
|
Lianmin Zheng
|
bc1534ff32
|
Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134)
Co-authored-by: Sehoon Kim <kssteven418@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
|
2025-03-06 06:13:59 -08:00 |
|
Lianmin Zheng
|
286e6540a6
|
Remove prefill-only-one-req (#4117)
|
2025-03-05 20:58:48 -08:00 |
|
Lianmin Zheng
|
77a3954bf7
|
Simplify eagle tests and TP sync in grammar backend (#4066)
|
2025-03-04 13:40:40 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Yineng Zhang
|
6718b10996
|
fix eagle unit test (#3591)
|
2025-02-15 23:10:48 +08:00 |
|
Lianmin Zheng
|
da6f8081f6
|
Fix CI tests (#3132)
|
2025-01-25 17:43:39 -08:00 |
|
Lianmin Zheng
|
a4331cd260
|
Add accuracy and latency tests of eagle into CI (#3027)
|
2025-01-21 02:55:14 -08:00 |
|
Yineng Zhang
|
67470bbb28
|
minor: update correct measurement unit (#2406)
|
2024-12-08 20:55:04 +08:00 |
|
Lianmin Zheng
|
ccaf1f997c
|
[CI] Print summary on github actions (#2274)
|
2024-11-29 23:48:54 -08:00 |
|
Lianmin Zheng
|
3c5538f781
|
Update CI threshold (#2186)
|
2024-11-25 15:24:17 -08:00 |
|
Lianmin Zheng
|
5652c56535
|
Update CI threshold & Improve code style (#2159)
|
2024-11-24 06:29:38 -08:00 |
|
Lianmin Zheng
|
7d671e4ad2
|
Enable overlap by default (#2067)
|
2024-11-19 22:07:58 -08:00 |
|
Yineng Zhang
|
766192610e
|
feat: update torch 2.5.1 (#2069)
|
2024-11-18 21:29:13 +08:00 |
|
Lianmin Zheng
|
38625e2139
|
Remove monkey_patch_vllm_dummy_weight_loader (#2064)
|
2024-11-17 15:48:12 -08:00 |
|
Lianmin Zheng
|
c1f401fc58
|
Revert "chore: update torch v2.5.1" (#2063)
|
2024-11-17 15:29:38 -08:00 |
|
Yineng Zhang
|
3b878863f7
|
chore: update torch v2.5.1 (#1849)
|
2024-11-18 00:06:00 +08:00 |
|
Ying Sheng
|
c98e84c21e
|
[Minor, Performance] Use torch.argmax for greedy sampling (#1589)
|
2024-10-06 13:15:05 -07:00 |
|
Ying Sheng
|
04b262cd91
|
[Fix] Fix major performance bug in certain cases (#1563)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
|
2024-10-04 08:51:11 +00:00 |
|
Lianmin Zheng
|
1acccb364a
|
Fix oom issues with fp8 for llama (#1454)
|
2024-09-18 03:45:19 -07:00 |
|
Lianmin Zheng
|
9463bc1385
|
Enable torch.compile for triton backend (#1422)
|
2024-09-14 15:38:37 -07:00 |
|
Lianmin Zheng
|
ad0ff62a4c
|
Balance test in CI (#1411)
|
2024-09-12 23:29:44 -07:00 |
|
Lianmin Zheng
|
68be2f6d3b
|
[CI] Include triton backend and online serving benchmark into CI (#1408)
|
2024-09-12 21:36:41 -07:00 |
|