Ke Bao
|
16eb33ffe2
|
Update vocab embedding deps and add TP switch (#1856)
|
2024-10-31 20:13:07 -07:00 |
|
Byron Hsu
|
56503d9bc9
|
[1/N] Remove CacheConfig import in all model files (#1658)
|
2024-10-14 09:06:34 -07:00 |
|
Lianmin Zheng
|
36d5acfca5
|
Rename InputMetadata -> ForwardBatch (#1543)
|
2024-09-30 02:41:11 -07:00 |
|
Yineng Zhang
|
b4408b0d16
|
feat: update linear deps 1/N (#1305)
|
2024-09-19 20:53:11 +08:00 |
|
Liangsheng Yin
|
70b6802982
|
Optimize conflicts between CUDA graph and vocab mask tensors (#1392)
|
2024-09-13 20:27:53 -07:00 |
|
Liangsheng Yin
|
381dd57bd6
|
Sampler cudagraph (#1253)
|
2024-08-28 18:58:52 -07:00 |
|
Yineng Zhang
|
f25f4dfde5
|
hotfix: revert sampler CUDA Graph (#1242)
|
2024-08-28 21:16:47 +10:00 |
|
Liangsheng Yin
|
75ce37f401
|
Move sampler into CUDA graph (#1201)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-26 07:02:50 -07:00 |
|
Yineng Zhang
|
6a38efa834
|
feat: replace all rmsnorm and silu (#1057)
|
2024-08-13 02:15:59 +10:00 |
|
Liangsheng Yin
|
87e8c090e9
|
Organize code (rename, movement) (#953)
|
2024-08-06 20:50:32 -07:00 |
|
Liangsheng Yin
|
cdcbde5fc3
|
Code structure refactor (#807)
|
2024-07-29 23:04:48 -07:00 |
|
Yineng Zhang
|
dd7e8b9421
|
chore: add copyright for srt (#790)
|
2024-07-28 23:07:12 +10:00 |
|
Ying Sheng
|
fb9296f0ed
|
Higher priority for user input of max_prefill_tokens & format (#540)
|
2024-06-12 21:48:40 -07:00 |
|
Lianmin Zheng
|
bf3e271fe0
|
Update vllm to v0.4.3 (#511)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-06-07 12:11:31 -07:00 |
|
Ying Sheng
|
0463f7fb52
|
Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2024-05-27 21:24:10 -07:00 |
|
Lianmin Zheng
|
19d2135cb8
|
Use model loader from vllm (#459)
|
2024-05-21 09:13:37 -07:00 |
|
Yuanhan Zhang
|
0992d85f92
|
support llava video (#426)
|
2024-05-13 16:57:00 -07:00 |
|
Qubitium
|
33b242df30
|
Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380)
Co-authored-by: ZX <zx@lbx.dev>
Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>
|
2024-05-11 16:37:49 -07:00 |
|
Liangsheng Yin
|
150d7020ed
|
Revert removing the unused imports (#385)
|
2024-04-23 22:36:33 +08:00 |
|
Liangsheng Yin
|
9acc6e3504
|
add .isort.cfg (#378)
|
2024-04-22 22:38:09 +08:00 |
|
Lianmin Zheng
|
65501a9cf1
|
Fix commandr import; format code
|
2024-04-16 18:10:12 +00:00 |
|
ZhouXingg
|
db611066ad
|
support command-r (#369)
|
2024-04-16 10:36:51 -07:00 |
|