初始化项目,由ModelHub XC社区提供模型
Model: m-a-p/OProver-8B-Round2 Source: Original Platform
This commit is contained in:
144
README.md
Normal file
144
README.md
Normal file
@@ -0,0 +1,144 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- lean4
|
||||
- theorem-proving
|
||||
- formal-mathematics
|
||||
- agentic
|
||||
- retrieval-augmented
|
||||
datasets:
|
||||
- m-a-p/OProofs
|
||||
---
|
||||
|
||||
# OProver
|
||||
|
||||
> A unified framework for **agentic formal theorem proving** in Lean 4.
|
||||
|
||||
- 📄 Paper: [OProver: A Unified Framework for Agentic Formal Theorem Proving](https://huggingface.co/papers/2605.17283) ([arXiv](https://arxiv.org/abs/2605.17283))
|
||||
- 📚 Collection: [m-a-p/OProver](https://huggingface.co/collections/m-a-p/oprover)
|
||||
- 📦 Dataset: [m-a-p/OProofs](https://huggingface.co/datasets/m-a-p/OProofs)
|
||||
|
||||
OProver treats theorem proving as a multi-round refinement loop. Given a target
|
||||
theorem, the prover retrieves top-k compiler-verified proofs from a memory of
|
||||
prior proofs, generates a proof attempt, runs the Lean 4 compiler, and—on
|
||||
failure—revises the attempt using the compiler feedback in the next round.
|
||||
The same retrieval and feedback signals are baked into training, so the
|
||||
training-time interface matches the proving-time interaction.
|
||||
|
||||
## Highlights
|
||||
|
||||
- **State-of-the-art among open-weight whole-proof provers.** OProver-32B
|
||||
attains the best Pass@32 on MiniF2F (93.3), ProverBench (58.2), and
|
||||
PutnamBench (11.3), and the second-best on MathOlympiad (22.8) and
|
||||
ProofNet (33.2). Even OProver-8B beats Goedel-Prover-V2-32B on all five
|
||||
benchmarks despite having 4× fewer parameters.
|
||||
- **Agentic proving is in the policy, not bolted on.** Retrieval, multi-turn
|
||||
compiler feedback, and iterative repair are all part of the trained policy.
|
||||
Ablations show feedback is the dominant driver: removing it costs OProver-32B
|
||||
4.9–7.4 Pass@32 points across benchmarks; removing retrieval on top adds
|
||||
0.5–1.7 more.
|
||||
- **Co-evolving prover and corpus.** Each post-training iteration runs agentic
|
||||
rollouts; verified proofs are added to OProofs and re-indexed for retrieval,
|
||||
repair trajectories become SFT data, and unresolved hard cases provide RL
|
||||
signal. MiniF2F-Test Pass@32 grows monotonically: OProver-8B 79.5 → 86.2 →
|
||||
87.0 → 91.8 across rounds; OProver-32B 84.7 → 88.1 → 93.3.
|
||||
|
||||
## What's in this release
|
||||
|
||||
The OProver collection ([m-a-p/OProver](https://huggingface.co/collections/m-a-p/oprover)) bundles the paper,
|
||||
the corpus, and seven model checkpoints covering both training stages and
|
||||
both model sizes:
|
||||
|
||||
| Repo | Stage | Size | Note |
|
||||
|---|---|---|---|
|
||||
| [OProver-8B-Base](https://huggingface.co/m-a-p/OProver-8B-Base) | Continued pretraining only | 8B | Domain-adapted base; before SFT/RL |
|
||||
| [OProver-32B-Base](https://huggingface.co/m-a-p/OProver-32B-Base) | Continued pretraining only | 32B | Domain-adapted base; before SFT/RL |
|
||||
| [OProver-8B-Round1](https://huggingface.co/m-a-p/OProver-8B-Round1) | Post-training Round 1 | 8B | After first SFT+RL iteration |
|
||||
| [OProver-8B-Round2](https://huggingface.co/m-a-p/OProver-8B-Round2) | Post-training Round 2 | 8B | After second SFT+RL iteration |
|
||||
| [OProver-32B-Round1](https://huggingface.co/m-a-p/OProver-32B-Round1) | Post-training Round 1 | 32B | After first SFT+RL iteration |
|
||||
| **[OProver-8B](https://huggingface.co/m-a-p/OProver-8B)** | Final | 8B | The 8B prover reported in the paper (Round 3) |
|
||||
| **[OProver-32B](https://huggingface.co/m-a-p/OProver-32B)** | Final | 32B | The 32B prover reported in the paper (Round 2) |
|
||||
| [OProofs](https://huggingface.co/datasets/m-a-p/OProofs) | Dataset | 6.86M proofs | Lean 4 corpus used for CPT, SFT, and the retrieval memory |
|
||||
|
||||
Use **OProver-8B** or **OProver-32B** for proving. The Base / Round-N
|
||||
checkpoints are released for reproducibility and ablation studies.
|
||||
|
||||
## Dataset: OProofs
|
||||
|
||||
OProofs is a large-scale Lean 4 corpus that doubles as the retrieval memory at
|
||||
proving time. It is built from three sources: public Lean resources
|
||||
(NuminaMath-LEAN, Lean-Workbook, Leanabell-FormalStmt, Goedel-Pset, …),
|
||||
large-scale autoformalization + agentic proof synthesis from informal math
|
||||
mined on Common Crawl and GitHub, and traces from OProver's own agentic
|
||||
proving runs.
|
||||
|
||||

|
||||
|
||||
## Training
|
||||
|
||||
Two stages, both anchored on OProofs.
|
||||
|
||||
**1. Continued pretraining (one-time).** A 65B-token mixture of formal
|
||||
Lean (≈30%, from OProofs), code (≈20%, OpenCoder), mathematics (≈40%,
|
||||
Nemotron-Math-4-Plus), and long-CoT (≈10%, ProLong-64K). AdamW, peak LR
|
||||
5e-5, cosine with 3% warmup, batch 512, sequence length 8192. Output:
|
||||
`OProver-{8B,32B}-Base`.
|
||||
|
||||
**2. Iterative post-training.** Each round runs:
|
||||
|
||||
1. Agentic proving with the current prover on a theorem pool, producing
|
||||
multi-round rollouts conditioned on retrieved proofs and Lean feedback.
|
||||
2. **SFT** on round-level repair examples `(s, R, p_{t-1}, f_{t-1}) → p_t`,
|
||||
with cross-entropy loss only on the new attempt.
|
||||
3. **GSPO RL** on hard cases (groups with non-trivial pass-rate). Per-round
|
||||
reward `r = 0.8 + 0.2·1[format ok]` if Lean-verified, else 0; advantages
|
||||
are pooled across the n×R rounds for the same theorem.
|
||||
4. Newly verified proofs and repair trajectories are folded back into OProofs
|
||||
and re-indexed into the retrieval memory for the next round.
|
||||
|
||||
## Results
|
||||
|
||||
Pass@32 (n=64) across five Lean 4 benchmarks. **Bold** is best, _underlined_
|
||||
is second-best.
|
||||
|
||||

|
||||
|
||||
OProver-32B reaches three best and two second-best across five benchmarks —
|
||||
the most top placements of any model in the comparison, despite being a
|
||||
32B dense model versus a 560B MoE or a 671B dense competitor.
|
||||
|
||||
## Loading
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
# Pick any of: OProver-8B, OProver-32B, *-Base, *-Round1, *-Round2
|
||||
name = "m-a-p/OProver-8B"
|
||||
tok = AutoTokenizer.from_pretrained(name)
|
||||
model = AutoModelForCausalLM.from_pretrained(name, torch_dtype="bfloat16", device_map="auto")
|
||||
```
|
||||
|
||||
```python
|
||||
from datasets import load_dataset
|
||||
ds = load_dataset("m-a-p/OProofs", split="train")
|
||||
```
|
||||
|
||||
OProver is trained against a multi-round agentic interface: at each round the
|
||||
input includes the target Lean statement, top-k retrieved verified proofs, the
|
||||
prior proof attempt, and the Lean compiler feedback. See the paper §2.1 for
|
||||
the prompt template, and Appendix B for serialization details.
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@article{ma2026oprover,
|
||||
title = {OProver: A Unified Framework for Agentic Formal Theorem Proving},
|
||||
author = {David Ma and Kaijing Ma and Shawn Guo and Yunfeng Shi and Enduo Zhao and Jiajun Shi and Zhaoxiang Zhang and Gavin Cheung and Jiaheng Liu and Zili Wang},
|
||||
journal = {arXiv preprint arXiv:2605.17283},
|
||||
year = {2026}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user