Commit Graph

56 Commits

Author SHA1 Message Date
Mick
b5e3d6031c vlm: support video as an input modality (#5888) 2025-07-09 23:48:35 -07:00
Lianmin Zheng
ce3a3e8783 Move multimodal processors into a separate folder (#7581) 2025-06-27 11:58:24 -07:00
Kiv Chen
64825b8395 model(vlm): mistral 3.1 (#5099)
Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>
2025-05-16 18:36:18 -07:00
Kiv Chen
5380cd7ea3 model(vlm): pixtral (#5084) 2025-05-13 00:16:10 -07:00
Mick
5cb552b1d4 refactor: multimodal data (#4754) 2025-03-31 09:57:51 -07:00
Mick
1e86457c90 model: Minicpmo (#3023) 2025-03-24 20:08:40 -07:00
Qubitium-ModelCloud
56a724eba3 [QUANT] Add GPTQModel Dynamic Quantization + lm_head Quantization (#3790)
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
2025-03-05 01:11:00 -08:00
Mick
7711ac6ed0 doc: emphasize and notify the usage of chat_template (#3589)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-15 00:10:32 -08:00
Ying Sheng
8586b72da0 [feat] Enable chunked prefill for llava-onevision (#2412) 2024-12-09 09:52:38 -08:00
Ying Sheng
aa47f64223 Revert "[feat] Enable chunked prefill for llava-onevision" (#2329) 2024-12-02 23:11:13 -08:00
Ying Sheng
480e38a733 [feat] Enable chunked prefill for llava-onevision (#2281) 2024-12-02 20:19:02 -08:00
Yineng Zhang
85e1a6f3aa Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-02 23:22:13 +08:00
Lianmin Zheng
afe1e46586 [Minor] fix the style for multimodal models (#2257) 2024-11-29 04:24:20 -08:00
Lianmin Zheng
f50a6cf443 Fix hash collision for multi modal models (#2256) 2024-11-29 03:15:58 -08:00
Ying Sheng
b7038fec9b [fix] Fix prefix caching for multi-image/video (#2239) 2024-11-28 12:08:13 -08:00
Ying Sheng
37c8a5761f [feat] Support session control for vision language models (#2210) 2024-11-27 00:03:29 -08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
Lianmin Zheng
4af3f889fc Simplify flashinfer indices update for prefill (#2074)
Co-authored-by: kavioyu <kavioyu@tencent.com>
Co-authored-by: kavioyu <kavioyu@gmail.com>
2024-11-18 00:02:36 -08:00
Lianmin Zheng
d17d19e5b8 Fix mixed batch for multi modal models (#1702) 2024-10-17 10:27:26 -07:00
Byron Hsu
56503d9bc9 [1/N] Remove CacheConfig import in all model files (#1658) 2024-10-14 09:06:34 -07:00
Lianmin Zheng
36d5acfca5 Rename InputMetadata -> ForwardBatch (#1543) 2024-09-30 02:41:11 -07:00
Liangsheng Yin
fd9ad817ec Organize image inputs (#1531) 2024-09-29 06:28:55 +00:00
Yineng Zhang
b4408b0d16 feat: update linear deps 1/N (#1305) 2024-09-19 20:53:11 +08:00
Kaichen Zhang - NTU
8234e663e9 [Minor Fix] Fix llava modalities issue for single-image (#1402) 2024-09-12 01:10:26 -07:00
Liangsheng Yin
69b3bb9ae1 Unify forward mode (#1360) 2024-09-09 13:49:29 -07:00
Kaichen Zhang - NTU
662ecd9368 [Feat] Add modalities for vision server when handling pixel values for llava (#1346) 2024-09-09 02:07:34 -07:00
Lianmin Zheng
f64eae3a29 [Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308) 2024-09-02 21:44:45 -07:00
Lianmin Zheng
0a97d7962d [Fix] Fix OOM in llava base class (#1249) 2024-08-28 08:45:49 -07:00
Lianmin Zheng
bf53bf5142 [Fix] Fix llava on multi images (#1247) 2024-08-28 06:33:05 -07:00
Kaichen Zhang - NTU
a5b14ad043 [Feat/WIP] add llava-onevision, with support for (1) siglip encoder, (2) qwen2 decoder (3) openai api compatible server. (#1123)
Co-authored-by: Bo Li <drluodian@gmail.com>
2024-08-23 14:11:16 -07:00
Liangsheng Yin
87e8c090e9 Organize code (rename, movement) (#953) 2024-08-06 20:50:32 -07:00
Liangsheng Yin
cdcbde5fc3 Code structure refactor (#807) 2024-07-29 23:04:48 -07:00
Yineng Zhang
dd7e8b9421 chore: add copyright for srt (#790) 2024-07-28 23:07:12 +10:00
Liangsheng Yin
c9ee3d3559 Fix model forward grad (#628) 2024-07-15 22:09:09 -07:00
Ying Sheng
fb9296f0ed Higher priority for user input of max_prefill_tokens & format (#540) 2024-06-12 21:48:40 -07:00
Amos You
651a23ee7c remove redundant pad_input_ids function (#500) 2024-06-07 12:23:29 -07:00
Lianmin Zheng
bf3e271fe0 Update vllm to v0.4.3 (#511)
Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>
Co-authored-by: ZX <zx@lbx.dev>
2024-06-07 12:11:31 -07:00
Ying Sheng
0463f7fb52 Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2024-05-27 21:24:10 -07:00
Li Bo
2b605ab1d7 [Feat/Fix] Refactoring Llava models into single file (#475) 2024-05-26 12:29:51 -07:00
Lianmin Zheng
19d2135cb8 Use model loader from vllm (#459) 2024-05-21 09:13:37 -07:00
Liangsheng Yin
690d162d97 Format code (#441) 2024-05-14 22:40:46 +08:00
Kaichen Zhang - NTU
664287b2a7 [Feat] Add llava qwen, llava mistral (#419)
Co-authored-by: Bo Li <drluodian@gmail.com>
2024-05-13 22:17:50 -07:00
Yuanhan Zhang
0992d85f92 support llava video (#426) 2024-05-13 16:57:00 -07:00
Qubitium
33b242df30 Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380)
Co-authored-by: ZX <zx@lbx.dev>
Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>
2024-05-11 16:37:49 -07:00
Liangsheng Yin
150d7020ed Revert removing the unused imports (#385) 2024-04-23 22:36:33 +08:00
Liangsheng Yin
9acc6e3504 add .isort.cfg (#378) 2024-04-22 22:38:09 +08:00
Lianmin Zheng
faba293a0d Improve gemma and documentations (#278) 2024-03-11 04:43:39 -07:00
Geary.Z
64fe311593 replace skip_embed with input_embeds (#222) 2024-03-10 19:04:52 -07:00
Lianmin Zheng
c51020cf0c Fix the chat template for llava-v1.6-34b & format code (#177) 2024-02-11 05:50:13 -08:00
Lianmin Zheng
4ea92f8307 Format code (#118) 2024-01-29 17:08:12 -08:00