Lianmin Zheng
|
f86c1e611f
|
Move scheduler code from tp_worker.py to scheduler.py (#1538)
|
2024-09-29 17:42:45 -07:00 |
|
Ke Bao
|
2c615d120f
|
[Feature] Support fp8 e5m2 kv cache with flashinfer (#1204)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-25 17:38:11 -07:00 |
|
Lianmin Zheng
|
9dae407812
|
Improve type annotation (#1029)
|
2024-08-11 02:44:59 -07:00 |
|
Liangsheng Yin
|
a01ddd9605
|
misc: fix the req_to_token member change (#967)
|
2024-08-07 01:52:10 -07:00 |
|
Liangsheng Yin
|
7fa54a1ab3
|
Make req_pool_indices on CPU (#960)
|
2024-08-07 01:41:25 -07:00 |
|
Ke Bao
|
e1eae1fd15
|
Support MLA for DeepSeek-V2 with Triton - step 1 (#905)
|
2024-08-05 03:40:33 +10:00 |
|
Liangsheng Yin
|
cdcbde5fc3
|
Code structure refactor (#807)
|
2024-07-29 23:04:48 -07:00 |
|