Ke Bao
|
62832bb272
|
Support cuda graph for DP attention (#2061)
|
2024-11-17 16:29:20 -08:00 |
|
Lianmin Zheng
|
9c939a3d8b
|
Clean up metrics code (#1972)
|
2024-11-09 15:43:20 -08:00 |
|
Lianmin Zheng
|
7ef0084b0d
|
Add sentence_transformers to CI dependency (#1958)
|
2024-11-08 01:21:29 -08:00 |
|
Chayenne
|
c77c1e05ba
|
fix black in pre-commit (#1940)
|
2024-11-08 07:42:47 +08:00 |
|
Xuehai Pan
|
a5e0defb5a
|
minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926)
|
2024-11-06 13:46:04 +00:00 |
|
Byron Hsu
|
530ff541cf
|
[router] Impl radix tree and set up CI (#1893)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-11-04 10:56:52 -08:00 |
|
Jani Monoses
|
916b3cdddc
|
Allow passing dtype and max_new_tokens to HF reference script (#1903)
|
2024-11-03 08:24:37 -08:00 |
|
Lianmin Zheng
|
a2e0424abf
|
Fix memory leak for chunked prefill 2 (#1858)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2024-10-31 14:51:51 -07:00 |
|
Lianmin Zheng
|
b548801ddb
|
Update docs (#1839)
|
2024-10-30 02:49:08 -07:00 |
|
Lianmin Zheng
|
6aa94b967c
|
Update ci workflows (#1804)
|
2024-10-26 04:32:36 -07:00 |
|
Ying Sheng
|
c5325aba75
|
[Profile] Add pytorch profiler (#1604)
|
2024-10-07 14:37:16 -07:00 |
|
Liangsheng Yin
|
99ec439da4
|
Organize Attention Backends (#1547)
|
2024-09-30 15:54:18 -07:00 |
|
Lianmin Zheng
|
fb2d0680e0
|
[Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510)
|
2024-09-24 21:37:33 -07:00 |
|
Lianmin Zheng
|
2854a5ea9f
|
Fix the overhead due to penalizer in bench_latency (#1496)
|
2024-09-23 07:38:14 -07:00 |
|
Lianmin Zheng
|
167591e864
|
Better unit tests for adding a new model (#1488)
|
2024-09-22 01:50:37 -07:00 |
|
Lianmin Zheng
|
2cd7e181dd
|
Fix env vars in bench_latency (#1472)
|
2024-09-19 03:19:26 -07:00 |
|
Ying Sheng
|
37963394aa
|
[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433)
|
2024-09-15 12:46:04 -07:00 |
|
Ying Sheng
|
712216928f
|
[Feature] Initial support for multi-LoRA serving (#1307)
|
2024-09-12 16:46:14 -07:00 |
|
Lianmin Zheng
|
3a6e8b6d78
|
[Minor] move triton attention kernels into a separate folder (#1379)
|
2024-09-10 15:15:08 -07:00 |
|
Lianmin Zheng
|
f64eae3a29
|
[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308)
|
2024-09-02 21:44:45 -07:00 |
|
Lianmin Zheng
|
f6af3a6561
|
Cleanup readme, llava examples, usage examples and nccl init (#1194)
|
2024-08-24 08:02:23 -07:00 |
|
Ying Sheng
|
0909bb0d2f
|
[Feat] Add window attention for gemma-2 (#1056)
|
2024-08-13 17:01:26 -07:00 |
|
foszto
|
c62d560c03
|
#590 Increase default , track changes in examples and documentation (#971)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-08 00:54:46 +00:00 |
|
Ying Sheng
|
3bc99e6fe4
|
Test openai vision api (#925)
|
2024-08-05 13:51:55 +10:00 |
|
Ying Sheng
|
995af5a54b
|
Improve the structure of CI (#911)
|
2024-08-03 23:09:21 -07:00 |
|
Ying Sheng
|
4075677621
|
Add OpenAI backend to the CI test (#869)
|
2024-08-01 09:25:24 -07:00 |
|
zhyncs
|
2e341cd493
|
misc: add pre-commit config (#637)
|
2024-07-17 11:55:39 -07:00 |
|
Liangsheng Yin
|
95c4e0dfac
|
Format Benchmark Code (#399)
|
2024-04-28 21:06:22 +08:00 |
|
Christopher Chou
|
864425300f
|
Yi-VL Model (#112)
|
2024-02-01 08:33:22 -08:00 |
|
Lianmin Zheng
|
70359bf31a
|
Update benchmark scripts (#8)
|
2024-01-15 16:12:57 -08:00 |
|
Lianmin Zheng
|
30720e732c
|
Add install with pip (#3)
|
2024-01-09 12:43:40 -08:00 |
|