--- license: apache-2.0 datasets: - lingshu-medical-mllm/ReasonMed base_model: - unsloth/Qwen2.5-0.5B-Instruct --- ## Info ![AKmUU](https://cdn-uploads.huggingface.co/production/uploads/66e00ba55e4fd4bfead4a97c/zxUf9YsQPkICX5-n1BVjc.jpeg) ![Demo Screenshot](https://cdn-uploads.huggingface.co/production/uploads/66e00ba55e4fd4bfead4a97c/ZWQpqF2613W9Ty9NBYUk1.png) # Qwen2.5-0.5B-Medical-ReasonMed370K A 0.5 billion parameter medical reasoning model fine-tuned on the complete ReasonMed 370K dataset. This model is built on top of Qwen2.5-0.5B-Instruct and trained to perform structured clinical reasoning, differential diagnosis, and evidence-based medical question answering. ## Model Details - **Base Model**: unsloth/Qwen2.5-0.5B-Instruct - **Model Size**: 0.5B parameters - **Fine-tuning Method**: LoRA via Unsloth - **Training Dataset**: ReasonMed 370K (full dataset) - **Training Hardware**: NVIDIA Tesla T4 (Kaggle free tier) - **License**: Apache 2.0 ## Training Details The model was fine-tuned in two stages, each covering half of the ReasonMed dataset: **Stage 1**: Fine-tuned on the first 185,000 samples of ReasonMed using LoRA with the following configuration: - LoRA rank: 8 - LoRA alpha: 16 - Learning rate: 5e-5 - Batch size: 2 with 16 gradient accumulation steps - Max sequence length: 4096 - Epochs: 1 - Optimizer: AdamW 8-bit **Stage 2**: Continued fine-tuning on the remaining 184,983 samples with identical configuration, completing one full pass over the entire 370K dataset. Both stages used `packing=False` to ensure every sample was processed individually without truncation. ## Dataset This model was trained on [ReasonMed](https://huggingface.co/datasets/lingshu-medical-mllm/ReasonMed), the largest open-source medical reasoning dataset available, comprising 370,000 high-quality examples distilled from 1.75 million initial reasoning paths generated by multiple large language models. ReasonMed is built through a multi-agent verification and refinement pipeline that includes an Error Refiner to correct error-prone reasoning steps. Each example combines detailed chain-of-thought reasoning with a concise answer summary, covering a wide range of medical topics including clinical reasoning, differential diagnosis, pharmacology, and medical question answering. For more details on the dataset, refer to the official repository: https://github.com/alibaba-damo-academy/ReasonMed ## What the Model Can Do After training on the full ReasonMed dataset, the model demonstrates the ability to: - Work through clinical presentations step by step - Generate differential diagnoses with reasoning for each option - Rule out unlikely diagnoses with justification - Provide structured final answers with clinical pearls - Reason through medical multiple choice questions with explanation ## Demo The screenshot above shows the model running through a clinical scenario involving hypothyroidism, demonstrating its ability to identify key symptoms, interpret lab values, and produce a structured response with management guidance. ## Limitations - This is a 0.5B parameter model and has a hard ceiling on reasoning depth and factual recall - Small models are prone to inconsistency across similar questions - The model may occasionally hallucinate clinical details - This model is intended for research and educational purposes only - It should not be used for real clinical decision making or as a substitute for a qualified medical professional ## Usage ```python from unsloth import FastLanguageModel import torch model, tokenizer = FastLanguageModel.from_pretrained( model_name = "Rumiii/Qwen2.5-0.5B-Medical-ReasonMed370K", max_seq_length = 4096, load_in_4bit = True, ) FastLanguageModel.for_inference(model) messages = [ {"role": "user", "content": "Your medical question here"} ] inputs = tokenizer.apply_chat_template( messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt" ).to("cuda") outputs = model.generate( input_ids = inputs, max_new_tokens = 1024, temperature = 0.7, do_sample = True, repetition_penalty = 1.3, no_repeat_ngram_size = 3, top_p = 0.9, top_k = 50, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Citation If you use this model, please cite the ReasonMed dataset: ```bibtex @misc{sun2025reasonmed370kmultiagentgenerated, title={ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning}, author={Yu Sun and Xingyu Qian and Weiwen Xu and Hao Zhang and Chenghao Xiao and Long Li and Yu Rong and Wenbing Huang and Qifeng Bai and Tingyang Xu}, year={2025}, eprint={2506.09513}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.09513}, } ``` ## Acknowledgements Training was conducted on Kaggle free tier infrastructure using Unsloth for efficient fine-tuning. The ReasonMed dataset was created by the team at Alibaba DAMO Academy and Tencent AI Lab.