Lianmin Zheng
|
dc0705a504
|
Simplify prepare_extend_after_decode (#6987)
|
2025-06-09 16:39:21 -07:00 |
|
Lianmin Zheng
|
2d72fc47cf
|
Improve profiler and integrate profiler in bench_one_batch_server (#6787)
|
2025-05-31 15:53:55 -07:00 |
|
kk
|
7a5e6ce1cb
|
Fix GPU OOM (#6564)
Co-authored-by: michael <michael.zhang@amd.com>
|
2025-05-24 16:38:39 -07:00 |
|
Sai Enduri
|
73eb67c087
|
Enable unit tests for AMD CI. (#6283)
|
2025-05-14 12:55:36 -07:00 |
|
Lianmin Zheng
|
fba8eccd7e
|
Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 00:17:33 -07:00 |
|
Ke Bao
|
ebaba85655
|
Update ci test and doc for MTP api change (#5952)
|
2025-05-01 09:30:27 -07:00 |
|
Lianmin Zheng
|
26fc32d168
|
[CI] tune the test order to warmup the server (#5860)
|
2025-04-28 19:27:37 -07:00 |
|
Lianmin Zheng
|
849c83a0c0
|
[CI] test chunked prefill more (#5798)
|
2025-04-28 10:57:17 -07:00 |
|
Lianmin Zheng
|
daed453e84
|
[CI] Improve github summary & enable fa3 for more models (#5796)
|
2025-04-27 15:29:46 -07:00 |
|
Baizhou Zhang
|
f9fb33efc3
|
Add 8-GPU Test for Deepseek-V3 (#5691)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-04-27 12:46:12 -07:00 |
|