123 lines
5.0 KiB
Markdown
123 lines
5.0 KiB
Markdown
|
|
---
|
|||
|
|
library_name: transformers
|
|||
|
|
license: apache-2.0
|
|||
|
|
license_link: https://huggingface.co/UbiquantAI/Fleming-R1-7B/blob/main/LICENSE
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Fleming-R1-7B
|
|||
|
|
<p align="center" style="margin: 0;">
|
|||
|
|
<a href="https://github.com/UbiquantAI/Fleming-R1" aria-label="GitHub Repository" style="text-decoration:none;">
|
|||
|
|
<span style="display:inline-flex;align-items:center;gap:.35em;">
|
|||
|
|
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"
|
|||
|
|
width="16" height="16" aria-hidden="true"
|
|||
|
|
style="vertical-align:text-bottom;fill:currentColor;">
|
|||
|
|
<path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8Z"/>
|
|||
|
|
</svg>
|
|||
|
|
<span>GitHub</span>
|
|||
|
|
</span>
|
|||
|
|
</a>
|
|||
|
|
<span style="margin:0 .75em;opacity:.6;">•</span>
|
|||
|
|
<a href="https://arxiv.org/abs/2509.15279" aria-label="Paper">📑 Paper</a>
|
|||
|
|
</p>
|
|||
|
|
|
|||
|
|
## Highlights
|
|||
|
|
|
|||
|
|
## 📖 Model Overview
|
|||
|
|
|
|||
|
|
Fleming-R1 is a reasoning model for medical scenarios that can perform step-by-step analysis of complex problems and produce reliable answers. The model follows a training paradigm of “chain-of-thought cold start” plus large-scale reinforcement learning. On multiple medical benchmarks, the 7B version achieves SOTA among models of a similar size; the 32B version performs close to the much larger GPT-OSS-120B and shows stronger results on Chinese tasks.
|
|||
|
|
|
|||
|
|
**Model Features:**
|
|||
|
|
|
|||
|
|
* **Reasoning-oriented data strategy** Combines public medical datasets with knowledge graphs to improve coverage of rare diseases, medications, and multi-hop reasoning chains;
|
|||
|
|
* **Chain-of-thought cold start** Uses high-quality reasoning traces distilled from teacher models to guide the model in learning basic reasoning patterns;
|
|||
|
|
* **Two-stage reinforcement learning** Employs adaptive hard-negative mining to strengthen the model’s reasoning when facing difficult problems.
|
|||
|
|
|
|||
|
|
## 📦 Releases
|
|||
|
|
|
|||
|
|
- **Fleming-R1-7B** —— Trained on Qwen2.5-7B
|
|||
|
|
🤗 [`UbiquantAI/Fleming-R1-7B`](https://huggingface.co/UbiquantAI/Fleming-R1-7B)
|
|||
|
|
- **Fleming-R1-32B** —— Trained on Qwen3-32B
|
|||
|
|
🤗 [`UbiquantAI/Fleming-R1-32B`](https://huggingface.co/UbiquantAI/Fleming-R1-32B)
|
|||
|
|
|
|||
|
|
## 📊 Performance
|
|||
|
|
|
|||
|
|
### Main Benchmark Results
|
|||
|
|
|
|||
|
|
<div align="center">
|
|||
|
|
<img src="images/exp_result.png" alt="Benchmark Results" width="60%">
|
|||
|
|
</div>
|
|||
|
|
|
|||
|
|
### Reasoning Ability Comparison
|
|||
|
|
|
|||
|
|
On the MedXpertQA benchmark, which evaluates medical reasoning ability, Fleming-R1 surpasses models of similar—and even larger—sizes, and is on par with certain closed-source models.
|
|||
|
|
|
|||
|
|
<div align="center">
|
|||
|
|
<img src="images/size_compare.png" alt="Size comparison" width="60%">
|
|||
|
|
</div>
|
|||
|
|
|
|||
|
|
## 🔧 Quick Start
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
|
|||
|
|
model_name = "UbiquantAI/Fleming-R1-7B"
|
|||
|
|
|
|||
|
|
# load the tokenizer and the model
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name,
|
|||
|
|
torch_dtype="auto",
|
|||
|
|
device_map="auto"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# prepare the model input
|
|||
|
|
prompt = "What should I do if I suddenly develop a fever?"
|
|||
|
|
messages = [
|
|||
|
|
{"role": "user", "content": prompt}
|
|||
|
|
]
|
|||
|
|
text = tokenizer.apply_chat_template(
|
|||
|
|
messages,
|
|||
|
|
tokenize=False,
|
|||
|
|
add_generation_prompt=True,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
|||
|
|
|
|||
|
|
# conduct text completion
|
|||
|
|
generated_ids = model.generate(
|
|||
|
|
**model_inputs,
|
|||
|
|
max_new_tokens=32768
|
|||
|
|
)
|
|||
|
|
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
|||
|
|
|
|||
|
|
# parsing thinking content
|
|||
|
|
output = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
|
|||
|
|
thinking_content = output.split("<think>")[-1].split("</think>")[0]
|
|||
|
|
content = output.split("</think>")[-1]
|
|||
|
|
|
|||
|
|
print("####thinking content:\n", thinking_content)
|
|||
|
|
print("\n")
|
|||
|
|
print("####answer:\n", content)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## ⚠️ Safety Statement
|
|||
|
|
|
|||
|
|
This project is for research and non-clinical reference only; it must not be used for actual diagnosis or treatment decisions.
|
|||
|
|
The generated reasoning traces are an auditable intermediate process and do not constitute medical advice.
|
|||
|
|
In medical scenarios, results must be reviewed and approved by qualified professionals, and all applicable laws, regulations, and privacy compliance requirements in your region must be followed.
|
|||
|
|
|
|||
|
|
## 📚 Citation
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@misc{flemingr1,
|
|||
|
|
title={Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning},
|
|||
|
|
author={Chi Liu and Derek Li and Yan Shu and Robin Chen and Derek Duan and Teng Fang and Bryan Dai},
|
|||
|
|
year={2025},
|
|||
|
|
eprint={2509.15279},
|
|||
|
|
archivePrefix={arXiv},
|
|||
|
|
primaryClass={cs.LG},
|
|||
|
|
url={https://arxiv.org/abs/2509.15279},
|
|||
|
|
}
|
|||
|
|
```
|