sglang

Author	SHA1	Message	Date
Mick	b5e3d6031c	vlm: support video as an input modality (#5888 )	2025-07-09 23:48:35 -07:00
Lianmin Zheng	ce3a3e8783	Move multimodal processors into a separate folder (#7581 )	2025-06-27 11:58:24 -07:00
Kiv Chen	64825b8395	model(vlm): mistral 3.1 (#5099 ) Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>	2025-05-16 18:36:18 -07:00
Kiv Chen	5380cd7ea3	model(vlm): pixtral (#5084 )	2025-05-13 00:16:10 -07:00
Mick	5cb552b1d4	refactor: multimodal data (#4754 )	2025-03-31 09:57:51 -07:00
Mick	1e86457c90	model: Minicpmo (#3023 )	2025-03-24 20:08:40 -07:00
Qubitium-ModelCloud	56a724eba3	[QUANT] Add GPTQModel Dynamic Quantization + `lm_head` Quantization (#3790 ) Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>	2025-03-05 01:11:00 -08:00
Mick	7711ac6ed0	doc: emphasize and notify the usage of chat_template (#3589 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-15 00:10:32 -08:00
Ying Sheng	8586b72da0	[feat] Enable chunked prefill for llava-onevision (#2412 )	2024-12-09 09:52:38 -08:00
Ying Sheng	aa47f64223	Revert "[feat] Enable chunked prefill for llava-onevision" (#2329 )	2024-12-02 23:11:13 -08:00
Ying Sheng	480e38a733	[feat] Enable chunked prefill for llava-onevision (#2281 )	2024-12-02 20:19:02 -08:00
Yineng Zhang	85e1a6f3aa	Update model_loader deps and qqq quantization deps (#2220 ) (#2318 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-02 23:22:13 +08:00
Lianmin Zheng	afe1e46586	[Minor] fix the style for multimodal models (#2257 )	2024-11-29 04:24:20 -08:00
Lianmin Zheng	f50a6cf443	Fix hash collision for multi modal models (#2256 )	2024-11-29 03:15:58 -08:00
Ying Sheng	b7038fec9b	[fix] Fix prefix caching for multi-image/video (#2239 )	2024-11-28 12:08:13 -08:00
Ying Sheng	37c8a5761f	[feat] Support session control for vision language models (#2210 )	2024-11-27 00:03:29 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Lianmin Zheng	4af3f889fc	Simplify flashinfer indices update for prefill (#2074 ) Co-authored-by: kavioyu <kavioyu@tencent.com> Co-authored-by: kavioyu <kavioyu@gmail.com>	2024-11-18 00:02:36 -08:00
Lianmin Zheng	d17d19e5b8	Fix mixed batch for multi modal models (#1702 )	2024-10-17 10:27:26 -07:00
Byron Hsu	56503d9bc9	[1/N] Remove `CacheConfig` import in all model files (#1658 )	2024-10-14 09:06:34 -07:00
Lianmin Zheng	36d5acfca5	Rename InputMetadata -> ForwardBatch (#1543 )	2024-09-30 02:41:11 -07:00
Liangsheng Yin	fd9ad817ec	Organize image inputs (#1531 )	2024-09-29 06:28:55 +00:00
Yineng Zhang	b4408b0d16	feat: update linear deps 1/N (#1305 )	2024-09-19 20:53:11 +08:00
Kaichen Zhang - NTU	8234e663e9	[Minor Fix] Fix llava modalities issue for single-image (#1402 )	2024-09-12 01:10:26 -07:00
Liangsheng Yin	69b3bb9ae1	Unify forward mode (#1360 )	2024-09-09 13:49:29 -07:00
Kaichen Zhang - NTU	662ecd9368	[Feat] Add modalities for vision server when handling pixel values for llava (#1346 )	2024-09-09 02:07:34 -07:00
Lianmin Zheng	f64eae3a29	[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308 )	2024-09-02 21:44:45 -07:00
Lianmin Zheng	0a97d7962d	[Fix] Fix OOM in llava base class (#1249 )	2024-08-28 08:45:49 -07:00
Lianmin Zheng	bf53bf5142	[Fix] Fix llava on multi images (#1247 )	2024-08-28 06:33:05 -07:00
Kaichen Zhang - NTU	a5b14ad043	[Feat/WIP] add llava-onevision, with support for (1) siglip encoder, (2) qwen2 decoder (3) openai api compatible server. (#1123 ) Co-authored-by: Bo Li <drluodian@gmail.com>	2024-08-23 14:11:16 -07:00
Liangsheng Yin	87e8c090e9	Organize code (rename, movement) (#953 )	2024-08-06 20:50:32 -07:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00
Yineng Zhang	dd7e8b9421	chore: add copyright for srt (#790 )	2024-07-28 23:07:12 +10:00
Liangsheng Yin	c9ee3d3559	Fix model forward grad (#628 )	2024-07-15 22:09:09 -07:00
Ying Sheng	fb9296f0ed	Higher priority for user input of max_prefill_tokens & format (#540 )	2024-06-12 21:48:40 -07:00
Amos You	651a23ee7c	remove redundant pad_input_ids function (#500 )	2024-06-07 12:23:29 -07:00
Lianmin Zheng	bf3e271fe0	Update vllm to v0.4.3 (#511 ) Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com> Co-authored-by: ZX <zx@lbx.dev>	2024-06-07 12:11:31 -07:00
Ying Sheng	0463f7fb52	Support data parallelism (static) (#480 ) Co-authored-by: Ying Sheng <ying.sheng@databricks.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2024-05-27 21:24:10 -07:00
Li Bo	2b605ab1d7	[Feat/Fix] Refactoring Llava models into single file (#475 )	2024-05-26 12:29:51 -07:00
Lianmin Zheng	19d2135cb8	Use model loader from vllm (#459 )	2024-05-21 09:13:37 -07:00
Liangsheng Yin	690d162d97	Format code (#441 )	2024-05-14 22:40:46 +08:00
Kaichen Zhang - NTU	664287b2a7	[Feat] Add llava qwen, llava mistral (#419 ) Co-authored-by: Bo Li <drluodian@gmail.com>	2024-05-13 22:17:50 -07:00
Yuanhan Zhang	0992d85f92	support llava video (#426 )	2024-05-13 16:57:00 -07:00
Qubitium	33b242df30	Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380 ) Co-authored-by: ZX <zx@lbx.dev> Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>	2024-05-11 16:37:49 -07:00
Liangsheng Yin	150d7020ed	Revert removing the unused imports (#385 )	2024-04-23 22:36:33 +08:00
Liangsheng Yin	9acc6e3504	add `.isort.cfg` (#378 )	2024-04-22 22:38:09 +08:00
Lianmin Zheng	faba293a0d	Improve gemma and documentations (#278 )	2024-03-11 04:43:39 -07:00
Geary.Z	64fe311593	replace skip_embed with input_embeds (#222 )	2024-03-10 19:04:52 -07:00
Lianmin Zheng	c51020cf0c	Fix the chat template for llava-v1.6-34b & format code (#177 )	2024-02-11 05:50:13 -08:00
Lianmin Zheng	4ea92f8307	Format code (#118 )	2024-01-29 17:08:12 -08:00

1 2

56 Commits