Files
ReasonFlux-F1-7B/README.md
ModelHub XC 1143e078d6 初始化项目,由ModelHub XC社区提供模型
Model: Gen-Verse/ReasonFlux-F1-7B
Source: Original Platform
2026-05-09 00:48:53 +08:00

87 lines
4.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
library_name: transformers
license: other
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: ReasonFlux-F1-7B
results: []
---
# ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Revolutionary template-augmented reasoning paradigm enpowers a 32B model to outperform o1-mini and DeepSeek-R1 distilled models in reasoning tasks.
| Task/Pass@1 | **ReasonFlux-F1-32B** | **ReasonFlux-Zero-32B** | **R1-Distill-32B** | **o1-mini** | **LIMO -32B** | **s1-32B** |
| :------------- | :----------------: | :-------------: | :-------------------: | :-----------------: | :--------: | :--------: |
| MATH500 | **96.0** | 91.2 | 94.3 | 90.0 | 90.6 | 93.0 |
| AIME 2024 | **76.7** | 56.7 | 72.6 | 56.7 | 50.0 | 56.7 |
| AIME 2025 | **53.3** | 37.2 | 46.67 | 50.8 | 37.2 | 49.3 |
| GPQA-Diamond | **67.2** | 61.2 | 62.1 | 60.0 | 65.2 | 59.6 |
# ReasonFlux-F1-7B
> ReasonFlux-F1-7B is our finetuned SOTA-level reasoning LLM by leveraging the template-augmented reasoning trajectories from our [ReasonFlux-Zero](https://arxiv.org/abs/2502.06772).
* Github Repository: [Gen-Verse/ReasonFlux](https://github.com/Gen-Verse/ReasonFlux)
* Paper:[ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates](https://arxiv.org/abs/2502.06772)
* Dataset: [Gen-Verse/ReasonFlux-F1-SFT](https://huggingface.co/datasets/Gen-Verse/ReasonFlux-F1-SFT)
## Evaluation
We present the evaluation results of our ReasonFlux-F1-32B on challenging reasoning tasks including AIME2024,AIM2025,MATH500 and GPQA-Diamond. To make a fair comparison, we report the results of the LLMs on our evaluation scripts in [ReasonFlux-F1](https://github.com/Gen-Verse/ReasonFlux).
| Model | AIME2024@pass1 | AIME2025@pass1 | MATH500@pass1 | GPQA@pass1 |
| --------------------------------------- | :--------------: | :--------------: | :-------------: | :----------: |
| QwQ-32B-Preview | 46.7 | 37.2 | 90.6 | 65.2 |
| LIMO-32B | 56.3 | 44.5 | 94.8 | 58.1 |
| s1-32B | 56.7 | 49.3 | 93.0 | 59.6 |
| OpenThinker-32B | 66.0 | 53.3 | 94.8 | 60.1 |
| R1-Distill-32B | 70.0 | 46.7 | 92.0 | 59.6 |
| ReasonFlux-Zero-32B | 56.7 | 37.2 | 91.2 | 61.2 |
| **ReasonFlux-F1-32B** | **76.7** | **53.3** | **96.0** | **67.2** |
## Quick start with VLLM
```python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = 'Gen-Verse/ReasonFlux-F1-7B'
model = LLM(
model_id,
tensor_parallel_size=8,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
sampling_params = SamplingParams(
max_tokens=32768,
)
# 2022 AIME I Problems/Problem 15
question = """Let \(x, y\), and \(z\) be positive real numbers satisfying the system of equations:
\[
\begin{array}{c}
\sqrt{2 x-x y}+\sqrt{2 y-x y}=1 \\
\sqrt{2 y-y z}+\sqrt{2 z-y z}=\sqrt{2} \\
\sqrt{2 z-z x}+\sqrt{2 x-z x}=\sqrt{3} .
\end{array}
\]
Then \(\left[(1-x)(1-y)(1-z)\right]^{2}\) can be written as \(\frac{m}{n}\), where \(m\) and \(n\) are relatively prime positive integers. Find \(m+n\)."""
ds_prompt="<User>\n" + question + "<Assistant>\n"
output = model.generate(ds_prompt, sampling_params=sampling_params)
print(output[0].outputs[0].text)
```
## Citation
```bash
@article{yang2025reasonflux,
title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates},
author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi},
journal={arXiv preprint arXiv:2502.06772},
year={2025}
}
```