Commit Graph

16 Commits

Author SHA1 Message Date
Mick
4395c87a9b refactor: unify names of the feature field of MultimodalDataItem (#8075) 2025-07-16 17:52:38 -07:00
Xinyuan Tong
7498522f7d update transformers to 4.53.2 (#8029)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-07-15 18:24:39 -07:00
Mick
b5e3d6031c vlm: support video as an input modality (#5888) 2025-07-09 23:48:35 -07:00
Lianmin Zheng
ce3a3e8783 Move multimodal processors into a separate folder (#7581) 2025-06-27 11:58:24 -07:00
Chang Su
72676cd6c0 feat(oai refactor): Replace openai_api with entrypoints/openai (#7351)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
2025-06-21 13:21:06 -07:00
Yineng Zhang
7eb9d8e594 chore: upgrade transformers 4.52.3 (#6575)
Co-authored-by: Mick <mickjagger19@icloud.com>
2025-05-25 22:49:58 -07:00
Xinyuan Tong
681fdc264b Refactor vlm embedding routine to use precomputed feature (#6543)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-24 18:39:21 -07:00
Chang Su
4685fbb888 [VLM] Support chunk prefill for VLM (#6355)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-05-22 20:32:41 -07:00
Yury Sulsky
f19a9204cd Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136)
Co-authored-by: Yury Sulsky <ysulsky@tesla.com>
2025-05-16 12:26:15 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
Ying Sheng
11383cec3c [PP] Add pipeline parallelism (#5724) 2025-04-30 18:18:07 -07:00
Mick
c998d04b46 vlm: enable radix cache for qwen-vl models (#5349)
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
2025-04-23 20:35:05 -07:00
Mick
5cb552b1d4 refactor: multimodal data (#4754) 2025-03-31 09:57:51 -07:00
Mick
1e86457c90 model: Minicpmo (#3023) 2025-03-24 20:08:40 -07:00
Mick
11577cedb7 refactor: bug fixes and refactor for vlm (#4661) 2025-03-22 22:48:49 -07:00
Mick
d373a48c98 fix: second_per_grid_ts should be used to get mrope position (#3682) 2025-03-17 18:12:38 -07:00