168 lines
5.4 KiB
Markdown
168 lines
5.4 KiB
Markdown
|
|
---
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
<p align="center">
|
|||
|
|
<img src="./Bespoke-Labs-Logo.png" width="550">
|
|||
|
|
</p>
|
|||
|
|
|
|||
|
|
# Bespoke-MiniChart-7B
|
|||
|
|
|
|||
|
|
<a href="https://playground.bespokelabs.ai/minichart">
|
|||
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/g-QaXrmPLYk5m3Hq5vFtr.png" width="200px" />
|
|||
|
|
</a>
|
|||
|
|
|
|||
|
|
This is an open‑source chart‑understanding Vision‑Language Model (VLM) developed at [Bespoke Labs](https://www.bespokelabs.ai/) and maintained by [Liyan Tang](https://www.tangliyan.com/) and Bespoke Labs. It sets a new state‑of‑the‑art in chart question‑answering (Chart‑QA) for 7 billion‑parameter models, outperforming much larger closed models such as Gemini‑1.5‑Pro and Claude‑3.5 on seven public benchmarks.
|
|||
|
|
|
|||
|
|
1. **Blog Post**: https://www.bespokelabs.ai/blog/bespoke-minichart-7b
|
|||
|
|
2. **Playground**: https://playground.bespokelabs.ai/minichart
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Example Outputs
|
|||
|
|
|
|||
|
|
The examples below showcase how Bespoke-MiniChart-7B can perform both visual perception and textual reasoning.
|
|||
|
|
|
|||
|
|
|
|||
|
|
<p align="left">
|
|||
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/E5WGhi_fVNzCsrKeNeIs3.png" width="700">
|
|||
|
|
</p>
|
|||
|
|
|
|||
|
|
<p align="left">
|
|||
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/bYKXRm3sfOdX3zd_5qUpK.png" width="700">
|
|||
|
|
</p>
|
|||
|
|
|
|||
|
|
|
|||
|
|
# Model Performance
|
|||
|
|
|
|||
|
|
Bespoke-MiniChart-7B achieves state-of-the-art performance on chart understanding among models with similar sizes. In addition to that, the model can even surpass closed-models such as Gemini-1.5-Pro and Claude-3.5.
|
|||
|
|
|
|||
|
|
<p align="left">
|
|||
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/5pejAyzPG_tRBU6FwH7PA.png" width="700">
|
|||
|
|
</p>
|
|||
|
|
|
|||
|
|
We also compare the performance of our model finetuned using SFT+DPO vs SFT only.
|
|||
|
|
|
|||
|
|
In the table below, M1 and M2 are finetuned models with 270K and 1M SFT examples respsectively, and Bespoke-MiniChart-7B is the model finetuned using SFT+DPO.
|
|||
|
|
|
|||
|
|
<p align="left">
|
|||
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/WRsPs437niUrXmYtkRajG.png" width="700">
|
|||
|
|
</p>
|
|||
|
|
|
|||
|
|
|
|||
|
|
# Model Use:
|
|||
|
|
|
|||
|
|
[](https://colab.research.google.com/drive/1FEmlwGgn9209iQO-rs2-9UHPLoytwZMH?usp=sharing)
|
|||
|
|
|
|||
|
|
The model is available on the playground here: https://playground.bespokelabs.ai/minichart
|
|||
|
|
|
|||
|
|
You can also run the model with the following snippet:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import requests
|
|||
|
|
from PIL import Image
|
|||
|
|
from io import BytesIO
|
|||
|
|
import base64
|
|||
|
|
import matplotlib.pyplot as plt
|
|||
|
|
from vllm import LLM, SamplingParams
|
|||
|
|
|
|||
|
|
QA_PROMPT = """Please answer the question using the chart image.
|
|||
|
|
|
|||
|
|
Question: [QUESTION]
|
|||
|
|
|
|||
|
|
Please first generate your reasoning process and then provide the user with the answer. Use the following format:
|
|||
|
|
|
|||
|
|
<think>
|
|||
|
|
... your thinking process here ...
|
|||
|
|
</think>
|
|||
|
|
<answer>
|
|||
|
|
... your final answer (entity(s) or number) ...
|
|||
|
|
</answer>"""
|
|||
|
|
|
|||
|
|
def get_image_from_url(image_url):
|
|||
|
|
try:
|
|||
|
|
response = requests.get(image_url, stream=True)
|
|||
|
|
response.raise_for_status()
|
|||
|
|
return Image.open(BytesIO(response.content))
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"Error with image: {e}")
|
|||
|
|
return None
|
|||
|
|
|
|||
|
|
def get_answer(image_url, question, display=True):
|
|||
|
|
image = get_image_from_url(image_url)
|
|||
|
|
|
|||
|
|
if display:
|
|||
|
|
plt.figure(figsize=(10, 8))
|
|||
|
|
plt.imshow(image)
|
|||
|
|
plt.axis('off')
|
|||
|
|
plt.show()
|
|||
|
|
|
|||
|
|
if not image:
|
|||
|
|
return "Error downloading image"
|
|||
|
|
|
|||
|
|
buffered = BytesIO()
|
|||
|
|
image.save(buffered, format=image.format or 'JPEG')
|
|||
|
|
encoded_image = base64.b64encode(buffered.getvalue()).decode('utf-8')
|
|||
|
|
|
|||
|
|
messages = [{
|
|||
|
|
"role": "user",
|
|||
|
|
"content": [
|
|||
|
|
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"}},
|
|||
|
|
{"type": "text", "text": QA_PROMPT.replace("[QUESTION]", question)}
|
|||
|
|
]
|
|||
|
|
}]
|
|||
|
|
|
|||
|
|
response = llm.chat([messages], sampling_params=SamplingParams(temperature=0, max_tokens=500))
|
|||
|
|
return response[0].outputs[0].text
|
|||
|
|
|
|||
|
|
# Initialize the LLM
|
|||
|
|
llm = LLM(
|
|||
|
|
model="bespokelabs/Bespoke-MiniChart-7B",
|
|||
|
|
tokenizer_mode="auto",
|
|||
|
|
max_model_len=15000,
|
|||
|
|
tensor_parallel_size=1,
|
|||
|
|
gpu_memory_utilization=0.9,
|
|||
|
|
mm_processor_kwargs={"max_pixels": 1600*28*28},
|
|||
|
|
seed=2025,
|
|||
|
|
trust_remote_code=True,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Running inference
|
|||
|
|
image_url = "https://github.com/bespokelabsai/minichart-playground-examples/blob/main/images/ilyc9wk4jf8b1.png?raw=true"
|
|||
|
|
question = "How many global regions maintained their startup funding losses below 30% in 2022?"
|
|||
|
|
|
|||
|
|
print("\n\n=================Model Output:===============\n\n", get_answer(image_url, question))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
# Licence
|
|||
|
|
|
|||
|
|
This work is licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
|
|||
|
|
For commercial licensing, please contact company@bespokelabs.ai.
|
|||
|
|
|
|||
|
|
# Citation
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
@misc{bespoke_minichart_7b,
|
|||
|
|
title = {Bespoke-MiniChart-7B: pushing the frontiers of open VLMs for chart understanding},
|
|||
|
|
author = {Liyan Tang and Shreyas Pimpalgaonkar and Kartik Sharma and Alexandros G. Dimakis and Mahesh Sathiamoorthy and Greg Durrett},
|
|||
|
|
howpublished = {blog post},
|
|||
|
|
year = {2025},
|
|||
|
|
url={https://huggingface.co/bespokelabs/Bespoke-MiniChart-7B},
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
# Acknowledgements
|
|||
|
|
|
|||
|
|
**Bespoke Labs** team:
|
|||
|
|
|
|||
|
|
- Liyan Tang
|
|||
|
|
- Shreyas Pimpalgaonkar
|
|||
|
|
- Kartik Sharma
|
|||
|
|
- Alex Dimakis
|
|||
|
|
- Mahesh Sathiamoorthy
|
|||
|
|
- Greg Durrett
|
|||
|
|
|
|||
|
|
|
|||
|
|
*Model perfected at Bespoke Labs — where careful curation meets cutting‑edge modeling.*
|