model: Minicpmo (#3023)
This commit is contained in:
@@ -24,7 +24,7 @@
|
||||
- InternLM 2
|
||||
- Exaone 3
|
||||
- BaiChuan2
|
||||
- MiniCPM / MiniCPM 3 / MiniCPMV
|
||||
- MiniCPM / MiniCPM 3 / MiniCPM-v / MiniCPM-o
|
||||
- XVERSE / XVERSE MoE
|
||||
- SmolLM
|
||||
- GLM-4
|
||||
@@ -70,9 +70,9 @@ LLM.
|
||||
1. **Register your new model as multimodal**: Extend `is_multimodal_model` in [
|
||||
`model_config.py`](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/configs/model_config.py) to
|
||||
return True for your model.
|
||||
2. **Process Images**: Create a new `ImageProcessor` class that inherits from `BaseImageProcessor` and register this
|
||||
2. **Process Images**: Define a new `Processor` class that inherits from `BaseProcessor` and register this
|
||||
processor as your model's dedicated processor. See [
|
||||
`image_processor.py`](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/managers/image_processor.py)
|
||||
`multimodal_processor.py`](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/managers/multimodal_processor.py)
|
||||
for more details.
|
||||
3. **Handle Image Tokens**: Implement a `pad_input_ids` function for your new model, in which image tokens in the prompt
|
||||
should be expanded and replaced with image-hashes, so that SGLang can recognize different images for
|
||||
@@ -80,7 +80,7 @@ LLM.
|
||||
4. Replace Multi-headed `Attention` of ViT with SGLang's `VisionAttention`.
|
||||
|
||||
You can refer [Qwen2VL](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/qwen2_vl.py) or other
|
||||
vLMs. These models demonstrate how to properly handle both visual and textual inputs.
|
||||
vLMs. These models demonstrate how to properly handle both multimodal and textual inputs.
|
||||
|
||||
You should test the new vLM locally against hf models. See [`mmmu`](https://github.com/sgl-project/sglang/tree/main/benchmark/mmmu) for an example.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user