Files
OProver-8B-Round1/README.md
ModelHub XC d3609039e0 初始化项目,由ModelHub XC社区提供模型
Model: m-a-p/OProver-8B-Round1
Source: Original Platform
2026-05-29 01:32:13 +08:00

145 lines
6.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- lean4
- theorem-proving
- formal-mathematics
- agentic
- retrieval-augmented
datasets:
- m-a-p/OProofs
---
# OProver
> A unified framework for **agentic formal theorem proving** in Lean 4.
- 📄 Paper: [OProver: A Unified Framework for Agentic Formal Theorem Proving](https://huggingface.co/papers/2605.17283) ([arXiv](https://arxiv.org/abs/2605.17283))
- 📚 Collection: [m-a-p/OProver](https://huggingface.co/collections/m-a-p/oprover)
- 📦 Dataset: [m-a-p/OProofs](https://huggingface.co/datasets/m-a-p/OProofs)
OProver treats theorem proving as a multi-round refinement loop. Given a target
theorem, the prover retrieves top-k compiler-verified proofs from a memory of
prior proofs, generates a proof attempt, runs the Lean 4 compiler, and—on
failure—revises the attempt using the compiler feedback in the next round.
The same retrieval and feedback signals are baked into training, so the
training-time interface matches the proving-time interaction.
## Highlights
- **State-of-the-art among open-weight whole-proof provers.** OProver-32B
attains the best Pass@32 on MiniF2F (93.3), ProverBench (58.2), and
PutnamBench (11.3), and the second-best on MathOlympiad (22.8) and
ProofNet (33.2). Even OProver-8B beats Goedel-Prover-V2-32B on all five
benchmarks despite having 4× fewer parameters.
- **Agentic proving is in the policy, not bolted on.** Retrieval, multi-turn
compiler feedback, and iterative repair are all part of the trained policy.
Ablations show feedback is the dominant driver: removing it costs OProver-32B
4.97.4 Pass@32 points across benchmarks; removing retrieval on top adds
0.51.7 more.
- **Co-evolving prover and corpus.** Each post-training iteration runs agentic
rollouts; verified proofs are added to OProofs and re-indexed for retrieval,
repair trajectories become SFT data, and unresolved hard cases provide RL
signal. MiniF2F-Test Pass@32 grows monotonically: OProver-8B 79.5 → 86.2 →
87.0 → 91.8 across rounds; OProver-32B 84.7 → 88.1 → 93.3.
## What's in this release
The OProver collection ([m-a-p/OProver](https://huggingface.co/collections/m-a-p/oprover)) bundles the paper,
the corpus, and seven model checkpoints covering both training stages and
both model sizes:
| Repo | Stage | Size | Note |
|---|---|---|---|
| [OProver-8B-Base](https://huggingface.co/m-a-p/OProver-8B-Base) | Continued pretraining only | 8B | Domain-adapted base; before SFT/RL |
| [OProver-32B-Base](https://huggingface.co/m-a-p/OProver-32B-Base) | Continued pretraining only | 32B | Domain-adapted base; before SFT/RL |
| [OProver-8B-Round1](https://huggingface.co/m-a-p/OProver-8B-Round1) | Post-training Round 1 | 8B | After first SFT+RL iteration |
| [OProver-8B-Round2](https://huggingface.co/m-a-p/OProver-8B-Round2) | Post-training Round 2 | 8B | After second SFT+RL iteration |
| [OProver-32B-Round1](https://huggingface.co/m-a-p/OProver-32B-Round1) | Post-training Round 1 | 32B | After first SFT+RL iteration |
| **[OProver-8B](https://huggingface.co/m-a-p/OProver-8B)** | Final | 8B | The 8B prover reported in the paper (Round 3) |
| **[OProver-32B](https://huggingface.co/m-a-p/OProver-32B)** | Final | 32B | The 32B prover reported in the paper (Round 2) |
| [OProofs](https://huggingface.co/datasets/m-a-p/OProofs) | Dataset | 6.86M proofs | Lean 4 corpus used for CPT, SFT, and the retrieval memory |
Use **OProver-8B** or **OProver-32B** for proving. The Base / Round-N
checkpoints are released for reproducibility and ablation studies.
## Dataset: OProofs
OProofs is a large-scale Lean 4 corpus that doubles as the retrieval memory at
proving time. It is built from three sources: public Lean resources
(NuminaMath-LEAN, Lean-Workbook, Leanabell-FormalStmt, Goedel-Pset, …),
large-scale autoformalization + agentic proof synthesis from informal math
mined on Common Crawl and GitHub, and traces from OProver's own agentic
proving runs.
![OProofs Lean 4 corpus statistics](./assets/oproofs.png)
## Training
Two stages, both anchored on OProofs.
**1. Continued pretraining (one-time).** A 65B-token mixture of formal
Lean (≈30%, from OProofs), code (≈20%, OpenCoder), mathematics (≈40%,
Nemotron-Math-4-Plus), and long-CoT (≈10%, ProLong-64K). AdamW, peak LR
5e-5, cosine with 3% warmup, batch 512, sequence length 8192. Output:
`OProver-{8B,32B}-Base`.
**2. Iterative post-training.** Each round runs:
1. Agentic proving with the current prover on a theorem pool, producing
multi-round rollouts conditioned on retrieved proofs and Lean feedback.
2. **SFT** on round-level repair examples `(s, R, p_{t-1}, f_{t-1}) → p_t`,
with cross-entropy loss only on the new attempt.
3. **GSPO RL** on hard cases (groups with non-trivial pass-rate). Per-round
reward `r = 0.8 + 0.2·1[format ok]` if Lean-verified, else 0; advantages
are pooled across the n×R rounds for the same theorem.
4. Newly verified proofs and repair trajectories are folded back into OProofs
and re-indexed into the retrieval memory for the next round.
## Results
Pass@32 (n=64) across five Lean 4 benchmarks. **Bold** is best, _underlined_
is second-best.
![Main results: Pass@32 across five Lean 4 benchmarks](./assets/main_results.png)
OProver-32B reaches three best and two second-best across five benchmarks —
the most top placements of any model in the comparison, despite being a
32B dense model versus a 560B MoE or a 671B dense competitor.
## Loading
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Pick any of: OProver-8B, OProver-32B, *-Base, *-Round1, *-Round2
name = "m-a-p/OProver-8B"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name, torch_dtype="bfloat16", device_map="auto")
```
```python
from datasets import load_dataset
ds = load_dataset("m-a-p/OProofs", split="train")
```
OProver is trained against a multi-round agentic interface: at each round the
input includes the target Lean statement, top-k retrieved verified proofs, the
prior proof attempt, and the Lean compiler feedback. See the paper §2.1 for
the prompt template, and Appendix B for serialization details.
## Citation
```bibtex
@article{ma2026oprover,
title = {OProver: A Unified Framework for Agentic Formal Theorem Proving},
author = {David Ma and Kaijing Ma and Shawn Guo and Yunfeng Shi and Enduo Zhao and Jiajun Shi and Zhaoxiang Zhang and Gavin Cheung and Jiaheng Liu and Zili Wang},
journal = {arXiv preprint arXiv:2605.17283},
year = {2026}
}
```