Lianmin Zheng
|
935cda944b
|
Misc clean up; Remove the support of jump forward (#4032)
|
2025-03-03 07:02:14 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
Yineng Zhang
|
70f894b810
|
feat: support flashinfer mla attention for deepseek v3 (#3550)
|
2025-02-14 08:50:14 +08:00 |
|
Lianmin Zheng
|
86e0dde555
|
Improve the user control of new_token_ratio (#1811)
|
2024-10-26 16:39:41 -07:00 |
|
Lianmin Zheng
|
c555ce2ca2
|
Revert "Fix memory leak when doing chunked prefill" (#1797)
|
2024-10-25 10:24:44 -07:00 |
|
Liangsheng Yin
|
a2f5e7555f
|
Fix memory leak when doing chunked prefill (#1787)
|
2024-10-25 08:01:17 -07:00 |
|
Lianmin Zheng
|
7ee6c259ff
|
Simplify the event loop and expose --num-continuous-decode-steps as an argument (#1652)
|
2024-10-12 21:35:30 -07:00 |
|
Lianmin Zheng
|
899cf5c438
|
Remove deprecated configs (#1431)
|
2024-09-15 08:52:18 -07:00 |
|
Lianmin Zheng
|
eda7c09048
|
Remove useless fields in global_config.py (#1328)
|
2024-09-04 05:37:32 -07:00 |
|
Lianmin Zheng
|
326df4bab2
|
Use a single workspace for flashinfer (#1077)
|
2024-08-14 19:25:37 -07:00 |
|
Liangsheng Yin
|
7fa54a1ab3
|
Make req_pool_indices on CPU (#960)
|
2024-08-07 01:41:25 -07:00 |
|
Yineng Zhang
|
768e05d08f
|
fix benchmark (#743)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-07-26 21:26:13 +10:00 |
|
Liangsheng Yin
|
4cd64b8ee6
|
Auto adjust new ratio (#708)
|
2024-07-23 22:06:02 -07:00 |
|
Mingyi
|
5ac8b80677
|
Simplify mem state (#623)
|
2024-07-15 02:01:09 -07:00 |
|
Liangsheng Yin
|
564a898ad9
|
Optimize mem indices mangement (#619)
|
2024-07-13 23:39:37 -07:00 |
|
Lianmin Zheng
|
665815969a
|
Enable cuda graph by default (#612)
|
2024-07-13 05:29:46 -07:00 |
|
Ying Sheng
|
dc1b8bcfaa
|
Format (#593)
|
2024-07-05 10:06:17 -07:00 |
|
Ying Sheng
|
2a754e57b0
|
2x performance improvement for large prefill & Fix workspace conflicts (#579)
|
2024-07-03 16:14:57 -07:00 |
|
Ying Sheng
|
1374334d38
|
Fix dependency & crash issues (#539)
|
2024-06-12 21:23:19 -07:00 |
|
Lianmin Zheng
|
e8a2327d52
|
Update version to 0.1.17 (#515)
|
2024-06-07 19:49:18 -07:00 |
|
Ying Sheng
|
0463f7fb52
|
Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2024-05-27 21:24:10 -07:00 |
|
Liangsheng Yin
|
f06e90c2cf
|
Optimize retract (#440)
|
2024-05-26 00:07:26 +08:00 |
|
Lianmin Zheng
|
5dc55a5f02
|
Handle truncation errors (#436)
|
2024-05-13 15:56:00 -07:00 |
|
Liangsheng Yin
|
39191c8515
|
Cache optimizations (#418)
|
2024-05-13 12:47:13 +08:00 |
|
ZhouXingg
|
183df47282
|
SamplingParams add "spaces_between_special_tokens" argument (#392)
|
2024-04-30 16:17:12 -07:00 |
|
Lianmin Zheng
|
22085081bb
|
release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-01-08 04:37:50 +00:00 |
|