ai-modelscope 272a07385a Improve model card: add pipeline tag, library name and license (#1)
- Improve model card: add pipeline tag, library name and license (ebef8dea586b7406630902a74532f548625092d8)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
2025-06-10 05:48:41 +08:00
2025-06-04 05:28:06 +00:00
2025-06-04 05:28:06 +00:00

base_model, language, tags, pipeline_tag, library_name, license
base_model language tags pipeline_tag library_name license
meta-llama/Llama-3.2-3B-Instruct
en
One-Shot-CFT
text-generation transformers cc-by-4.0

One-Shot-CFT: Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

💻 Code | 📄 Paper | 📊 Dataset | 🤗 Model | 🌐 Project Page

🧠 Overview

One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.

Instead of learning from reference answers (as in supervised fine-tuning) or reward signals (as in reinforcement learning), One-Shot CFT enables models to learn from critiques of diverse solutions to a single problem, enhancing their exposure to varied reasoning patterns and mitigating overfitting. This exposes the LLMs to multiple perspectives and error types, thereby more effectively unleashing their reasoning potential.

Key Highlights

  • Unleashes Reasoning with One Example: One-Shot CFT uses critiques of diverse model-generated solutions to a single problem to significantly boost performance across math and logic tasks. For example, with just 5 GPU hours of training on Qwen2.5-Math-7B, One-Shot CFT achieves an average improvement of +15% on six math benchmarks and +16% on three logic reasoning benchmarks.
  • Outperforms RLVR and Full SFT with 20× Less Compute: One-Shot CFT outperforms both one-shot Reinforcement Learning with Verifiable Rewards (RLVR) and full-dataset supervised fine-tuning, while requiring only 5 GPU hours on a 7B model—offering a much more efficient and stable training alternative.
  • Robust Across Seeds and Model Scales: One-Shot CFT remains effective across different seed problem choices and model sizes—from 1.5B to 14B parameters—demonstrating strong generalization and scalability.

This specific model is the One-Shot CFT variant trained based on Llama-3.2-3B-Instruct with DSR-CFT-p0 dataset.

Main Results

CFT Performance Comparison

One-shot CFT consistently improves mathematical and logical reasoning. Left: Average accuracy on six mathematical reasoning benchmarks for Qwen and LLaMA models, comparing base, SFT, RLVR, and CFT with only one training example. Right: In-domain accuracy on three logic reasoning benchmarks (BBEH subtasks) for Qwen2.5-Math-7B. Across both domains, CFT with a single problem significantly outperforms standard SFT and matches or exceeds reinforcement learning with much lower compute.

Citation

If you find our work helpful, please cite it as:

@article{wang2025unleashing,
  title={Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem},
  author={Wang, Yubo and Nie, Ping and Zou, Kai and Wu, Lijun and Chen, Wenhu},
  journal={arXiv preprint arXiv:2506.03295},
  year={2025}
}
Description
Model synced from source: TIGER-Lab/One-Shot-CFT-Math-Llama-3B
Readme 45 KiB