Go to file

ModelHub XC b07c86f7ff 初始化项目，由ModelHub XC社区提供模型

Model: OpenMed/Qwen2.5-3B-MedVL
Source: Original Platform

2026-05-19 23:36:47 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

model-00001-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

model-00002-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

preprocessor_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

synthvision_featured.png

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

video_preprocessor_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-19 23:36:47 +08:00

README.md

license, base_model, tags, pipeline_tag

license

base_model

Qwen2.5-3B-MedVL

Qwen2.5-VL-3B-Instruct fine-tuned on ~200K medical VQA records from the SynthVision pipeline.

Benchmark Results (Exact Match)

Split	VQA-RAD	PathVQA	SLAKE	Avg EM
Base (Qwen2.5-VL-3B-Instruct)	0.5033	0.3038	0.5438	0.4503
Fine-tuned	0.5211	0.3468	0.6032	0.4903
Delta	+3.5%	+14.2%	+10.9%	+8.9%

Usage

Transformers

from transformers import AutoProcessor, AutoModelForImageTextToText

model_id = "OpenMed/Qwen2.5-3B-MedVL"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://example.com/xray.jpg"},
            {"type": "text", "text": "What are the key findings in this chest X-ray?"},
        ],
    }
]

inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="OpenMed/Qwen2.5-3B-MedVL", max_model_len=4096, limit_mm_per_prompt={"image": 1})

messages = [{"role": "user", "content": [
    {"type": "image_url", "image_url": {"url": "https://example.com/xray.jpg"}},
    {"type": "text", "text": "What are the key findings in this chest X-ray?"},
]}]

output = llm.chat(messages, SamplingParams(temperature=0, max_tokens=512))
print(output[0].outputs[0].text)

SGLang

# Launch server
python -m sglang.launch_server --model-path OpenMed/Qwen2.5-3B-MedVL --chat-template qwen2-vl --port 8000

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
response = client.chat.completions.create(
    model="OpenMed/Qwen2.5-3B-MedVL",
    messages=[{"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "https://example.com/xray.jpg"}},
        {"type": "text", "text": "What are the key findings in this chest X-ray?"},
    ]}],
    max_tokens=512,
)
print(response.choices[0].message.content)

Training Details

Base model: Qwen/Qwen2.5-VL-3B-Instruct
Data: ~200K medical VQA records from the SynthVision pipeline
Method: LoRA (rank=32, alpha=32)
Target modules: q_proj, v_proj, k_proj, o_proj
Learning rate: 7e-5, cosine schedule
Steps: 700
Weight decay: 0.03
Hardware: 4x NVIDIA A100 80GB (48 vCPU, 568 GB RAM) via Hugging Face Jobs
Training time: 1h 14m

README.md

Qwen2.5-3B-MedVL

Benchmark Results (Exact Match)

Usage

Transformers

vLLM

SGLang

Training Details

Links