model(vlm): pixtral (#5084)

This commit is contained in:
Kiv Chen
2025-05-13 00:16:10 -07:00
committed by GitHub
parent b2e95f62b4
commit 5380cd7ea3
16 changed files with 1125 additions and 39 deletions

View File

@@ -33,9 +33,10 @@ The `hidden_states` folder contains examples on how to extract hidden states usi
* `hidden_states_engine.py`: An example how to extract hidden states using the Engine API.
* `hidden_states_server.py`: An example how to extract hidden states using the Server API.
## LLaVA-NeXT
## Multimodal
SGLang supports multimodal inputs for various model architectures. The `multimodal` folder contains examples showing how to use urls, files or encoded data to make requests to multimodal models. Examples include querying the [Llava-OneVision](multimodal/llava_onevision_server.py) model (image, multi-image, video), Llava-backed [Qwen-Llava](multimodal/qwen_llava_server.py) and [Llama3-Llava](multimodal/llama3_llava_server.py) models (image, multi-image), and Mistral AI's [Pixtral](multimodal/pixtral_server.py) (image, multi-image).
SGLang support LLaVA-OneVision with single-image, multi-image and video are supported. The folder `llava_onevision` shows how to do this.
## Token In, Token Out