Lianmin Zheng
|
8496701934
|
[Misc] Fix metrics, weight update lock, request logging (#2543)
|
2024-12-22 06:27:22 -08:00 |
|
SangBin Cho
|
9208618b3e
|
[Core] in batch prefix caching by delay scheduling (#2442)
|
2024-12-11 12:51:50 -08:00 |
|
Qun Yang
|
37ee906f61
|
Add more support for intel Gaudi accelerators (#2357)
|
2024-12-06 01:16:33 -08:00 |
|
Lianmin Zheng
|
b548801ddb
|
Update docs (#1839)
|
2024-10-30 02:49:08 -07:00 |
|
Lianmin Zheng
|
fc82f5a743
|
[Fix] Fix cuda graph padding for triton attention backend (#1782)
|
2024-10-24 12:33:15 -07:00 |
|
Lianmin Zheng
|
fbcbb26327
|
Fix perf regression for set_kv_buffer (#1765)
|
2024-10-23 09:57:08 -07:00 |
|
Lianmin Zheng
|
ad4125d1a9
|
Fuse more ops & Simplify token mapping (#1758)
|
2024-10-22 23:20:43 -07:00 |
|
Liangsheng Yin
|
94cde10920
|
Llama3.2 vision model support (#1551)
|
2024-10-21 15:01:21 -07:00 |
|
Lianmin Zheng
|
b48edff67f
|
Split the overlapped version of TpModelWorkerClient into a separate file (#1726)
|
2024-10-20 00:29:29 -07:00 |
|
Lianmin Zheng
|
59cbf47626
|
Unify the memory pool api and tp worker API (#1724)
|
2024-10-19 23:19:26 -07:00 |
|
Lianmin Zheng
|
769bf11c05
|
Fix the race condition in overlap mode (#1712)
|
2024-10-19 06:50:56 -07:00 |
|
Lianmin Zheng
|
2bcfba1b08
|
Skip unnecessary penalizer (#1707)
|
2024-10-18 17:54:03 -07:00 |
|
Lianmin Zheng
|
bc12d4033f
|
Add grouped free operations (#1706)
|
2024-10-18 13:21:05 -07:00 |
|
wxsm
|
b170930534
|
feat: radix tree code optimize (#1697)
|
2024-10-17 08:01:27 -07:00 |
|
Lianmin Zheng
|
9116b2896f
|
Add a new event loop (#1677)
|
2024-10-16 01:33:20 -07:00 |
|
Shuo Yang
|
061e546313
|
Support double sparsity (#1459)
|
2024-10-14 02:00:41 -07:00 |
|
Lianmin Zheng
|
9244f27f0a
|
[Minor] Improve the style and fix flaky tests (#1584)
|
2024-10-06 00:10:48 -07:00 |
|
Lianmin Zheng
|
45473d4b2b
|
Make input_ids a torch.Tensor (#1568)
|
2024-10-04 01:09:59 -07:00 |
|
Lianmin Zheng
|
114bbc8651
|
Use ipc instead of tcp in zmq (#1566)
|
2024-10-04 00:45:52 -07:00 |
|
Lianmin Zheng
|
32eb6e96f2
|
Organize sampling batch info better (#1562)
|
2024-10-03 18:29:49 -07:00 |
|
Lianmin Zheng
|
4ae0969c0a
|
Move status check in the memory pool to CPU (#1557)
|
2024-10-02 18:23:35 -07:00 |
|
Lianmin Zheng
|
f86c1e611f
|
Move scheduler code from tp_worker.py to scheduler.py (#1538)
|
2024-09-29 17:42:45 -07:00 |
|
luzengxiangcn
|
e6692bf4a5
|
debug radixcache stack_overflow (#1499)
|
2024-09-24 04:58:01 -07:00 |
|
Ke Bao
|
2c615d120f
|
[Feature] Support fp8 e5m2 kv cache with flashinfer (#1204)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-25 17:38:11 -07:00 |
|
Lianmin Zheng
|
c877292cc1
|
Re-organize CI tests (#1052)
|
2024-08-12 03:39:01 -07:00 |
|
Liangsheng Yin
|
fb7421db0d
|
minor: some potential bugs (#1044)
|
2024-08-12 05:35:44 +00:00 |
|
Liangsheng Yin
|
7de6034534
|
Fix the prefix indices (#1037)
|
2024-08-11 17:57:02 -07:00 |
|
Lianmin Zheng
|
9dae407812
|
Improve type annotation (#1029)
|
2024-08-11 02:44:59 -07:00 |
|
Liangsheng Yin
|
fcc0f5ed99
|
Fix wrong assert (#1028)
|
2024-08-11 09:22:16 +00:00 |
|
Liangsheng Yin
|
43fbb6d919
|
Fix input_ids && rename to fill_ids (#1021)
|
2024-08-10 16:24:12 -07:00 |
|
Liangsheng Yin
|
62757db6f0
|
Reduce the overhead when cache is disabled (#1010)
|
2024-08-09 16:36:57 -07:00 |
|
Liangsheng Yin
|
6ed4e3b8fb
|
Fix chunked prefill (#984)
|
2024-08-07 22:28:42 -07:00 |
|
Liangsheng Yin
|
7623091d97
|
RadixCache method adjust (#977)
|
2024-08-07 15:52:24 -07:00 |
|
Zhiqiang Xie
|
6db27f7b3b
|
misc: correct the int data type for token ids and indices (#969)
|
2024-08-08 04:40:07 +08:00 |
|
Liangsheng Yin
|
a01ddd9605
|
misc: fix the req_to_token member change (#967)
|
2024-08-07 01:52:10 -07:00 |
|
Liangsheng Yin
|
7fa54a1ab3
|
Make req_pool_indices on CPU (#960)
|
2024-08-07 01:41:25 -07:00 |
|
Ke Bao
|
e1eae1fd15
|
Support MLA for DeepSeek-V2 with Triton - step 1 (#905)
|
2024-08-05 03:40:33 +10:00 |
|
Liangsheng Yin
|
c020f9ceda
|
Support chunked prefill when radix cache is disabled (#811)
|
2024-08-01 00:29:01 -07:00 |
|
Liangsheng Yin
|
cdcbde5fc3
|
Code structure refactor (#807)
|
2024-07-29 23:04:48 -07:00 |
|