Lianmin Zheng
|
e4d68afcf0
|
[Minor] Many cleanup (#1357)
|
2024-09-09 04:14:11 -07:00 |
|
Lianmin Zheng
|
57d0bd91ec
|
Improve benchmark (#1140)
|
2024-08-17 17:43:23 -07:00 |
|
Lianmin Zheng
|
5a261bd055
|
Fix the deadlock in multi-node tp (#1122)
|
2024-08-16 01:39:24 -07:00 |
|
Lianmin Zheng
|
326df4bab2
|
Use a single workspace for flashinfer (#1077)
|
2024-08-14 19:25:37 -07:00 |
|
Lianmin Zheng
|
a59636bb5e
|
Update grok 1 model (#1095)
|
2024-08-14 04:40:44 -07:00 |
|
Ke Bao
|
e1eae1fd15
|
Support MLA for DeepSeek-V2 with Triton - step 1 (#905)
|
2024-08-05 03:40:33 +10:00 |
|
Lianmin Zheng
|
490a1f39dd
|
Fix cuda graph with flashinfer (#675)
|
2024-07-20 02:43:55 -07:00 |
|
Lianmin Zheng
|
a385ee27bd
|
Warmup cublas (#566)
|
2024-06-25 12:46:00 -07:00 |
|
Ying Sheng
|
fb9296f0ed
|
Higher priority for user input of max_prefill_tokens & format (#540)
|
2024-06-12 21:48:40 -07:00 |
|
Lianmin Zheng
|
55c1643627
|
Improve benchmark scripts & rename some scripts (#477)
|
2024-05-26 12:51:45 -07:00 |
|
Liangsheng Yin
|
14522e6a26
|
Organize Benchmark (#381)
|
2024-05-05 16:14:17 +08:00 |
|
Liangsheng Yin
|
95c4e0dfac
|
Format Benchmark Code (#399)
|
2024-04-28 21:06:22 +08:00 |
|
Liangsheng Yin
|
da19434c2f
|
Benchmark Updates (#382)
|
2024-04-24 02:23:01 +08:00 |
|
Lianmin Zheng
|
22085081bb
|
release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-01-08 04:37:50 +00:00 |
|