Lianmin Zheng
|
2d72fc47cf
|
Improve profiler and integrate profiler in bench_one_batch_server (#6787)
|
2025-05-31 15:53:55 -07:00 |
|
Lianmin Zheng
|
849c83a0c0
|
[CI] test chunked prefill more (#5798)
|
2025-04-28 10:57:17 -07:00 |
|
fzyzcjy
|
15ddd84322
|
Add retry for flaky tests in CI (#4755)
|
2025-03-25 16:53:12 -07:00 |
|
Ke Bao
|
45212ce18b
|
Add deepseek v2 torch compile pr test (#4538)
|
2025-03-18 00:29:24 -07:00 |
|
Lianmin Zheng
|
48473684cc
|
Split test_mla.py into two files (#4216)
|
2025-03-08 15:40:49 -08:00 |
|
Lianmin Zheng
|
d4017a6b63
|
[EAGLE] many fixes for eagle (#4195)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
|
2025-03-07 22:12:13 -08:00 |
|
Ke Bao
|
03b0364f76
|
Update nextn ci test (#4071)
|
2025-03-04 13:01:24 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Yineng Zhang
|
e782eb7e6a
|
chore: bump v0.4.3.post1 (#3638)
|
2025-02-17 21:58:19 +08:00 |
|
Yineng Zhang
|
e319153be8
|
update unit test (#3636)
|
2025-02-17 21:06:10 +08:00 |
|
Yineng Zhang
|
32b44d2fca
|
add mtp unit test (#3634)
|
2025-02-17 19:04:07 +08:00 |
|
Yineng Zhang
|
2b1808cec4
|
update unit test in AMD CI (#3366)
|
2025-02-07 17:25:16 +08:00 |
|
Ke Bao
|
5317902670
|
Add test for fp8 torch compile (#3246)
|
2025-02-01 16:07:54 +08:00 |
|
Lianmin Zheng
|
67008f4b32
|
Use only one GPU for MLA CI tests (#2858)
|
2025-01-13 03:55:33 -08:00 |
|
Lianmin Zheng
|
d4fc1a70e3
|
Crash the server correctly during error (#2231)
|
2024-11-28 00:22:39 -08:00 |
|
Lianmin Zheng
|
4af3f889fc
|
Simplify flashinfer indices update for prefill (#2074)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: kavioyu <kavioyu@gmail.com>
|
2024-11-18 00:02:36 -08:00 |
|
Lianmin Zheng
|
86fc0d79d0
|
Add a watch dog thread (#1816)
|
2024-10-27 02:00:50 -07:00 |
|
Ke Bao
|
b8ccaf4d73
|
Add MLA gsm8k eval (#1484)
|
2024-09-21 11:16:13 +08:00 |
|
Ke Bao
|
a68cb201dd
|
Fix triton head num (#1482)
|
2024-09-21 10:25:20 +08:00 |
|