初始化项目,由ModelHub XC社区提供模型
Model: uw-math-ai/gAPRIL-wo-exp Source: Original Platform
This commit is contained in:
190
README.md
Normal file
190
README.md
Normal file
@@ -0,0 +1,190 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
datasets:
|
||||
- uw-math-ai/APRIL
|
||||
tags:
|
||||
- lean4
|
||||
base_model:
|
||||
- Goedel-LM/Goedel-Prover-V2-8B
|
||||
pipeline_tag: text-generation
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
# APRIL-Goedel-8B: Lean Proof Repair (Repair Only)
|
||||
|
||||
This model is a LoRA finetune of [Goedel-Prover-V2-8B](https://huggingface.co/Goedel-LM/Goedel-Prover-V2-8B) on the [APRIL](https://huggingface.co/datasets/uw-math-ai/APRIL) dataset for **Lean 4 proof repair without explanation supervision**. Given an erroneous Lean proof and compiler feedback, the model directly produces a corrected proof. This variant maximizes single-shot repair accuracy by training exclusively on the repair objective.
|
||||
|
||||
## Model Details
|
||||
|
||||
- **Base model:** Goedel-Prover-V2-8B
|
||||
- **Method:** Supervised finetuning with LoRA (rank 32, α = 64)
|
||||
- **Training data:** APRIL — 260K paired erroneous/correct Lean proofs with compiler diagnostics (explanations excluded from supervision)
|
||||
- **Task:** Proof repair only
|
||||
- **Lean version:** 4.22.0-rc4
|
||||
|
||||
## Results
|
||||
|
||||
Single-shot proof repair accuracy (pass@1) on the APRIL test set (1,835 examples), evaluated by Lean compilation:
|
||||
|
||||
| Error Type | This Model (w/o exp) | With Explanations | Base Goedel-8B | Goedel-32B |
|
||||
|---|---|---|---|---|
|
||||
| **Full** | **36.7%** | 34.6% | 15.5% | 26.8% |
|
||||
| Tactic | **48.5%** | 41.7% | 19.6% | 34.2% |
|
||||
| Line | **25.5%** | 18.5% | 20.0% | 28.5% |
|
||||
| Theorem | **37.5%** | 36.8% | 12.7% | 23.0% |
|
||||
| Multi-Line | **24.3%** | 20.8% | 19.4% | 32.6% |
|
||||
|
||||
Training exclusively for repair (without explanation supervision) yields the highest pass@1 accuracy, gaining ~2% over the joint variant. However, this model does not produce human-interpretable diagnostics. See the [with-explanation variant](https://huggingface.co/uw-math-ai/gAPRIL-w-exp) for the trade-off discussion.
|
||||
|
||||
## Model & Dataset Download
|
||||
|
||||
| Resource | Description | Link |
|
||||
|---|---|---|
|
||||
| **APRIL Dataset** | 260K Lean proof-repair tuples with compiler diagnostics and explanations | [uw-math-ai/APRIL](https://huggingface.co/datasets/uw-math-ai/APRIL) |
|
||||
| **gAPRIL-w-exp** | Goedel-8B finetuned on APRIL with joint explanation supervision | [uw-math-ai/gAPRIL-w-exp](https://huggingface.co/uw-math-ai/gAPRIL-w-exp) |
|
||||
| **gAPRIL-wo-exp** | Goedel-8B finetuned on APRIL for repair only (no explanations) | [uw-math-ai/gAPRIL-wo-exp](https://huggingface.co/uw-math-ai/gAPRIL-wo-exp) |
|
||||
|
||||
## Usage
|
||||
|
||||
The model expects a chat-formatted prompt with the erroneous proof, goal state, error line, and compiler error message. The assistant response contains the corrected proof in a `lean` code block.
|
||||
|
||||
**System:** `You are diagnosing a single failing proof`
|
||||
|
||||
**User:**
|
||||
```
|
||||
Explain the error, suggest a fix, and provide the corrected proof based on the context:
|
||||
|
||||
Incorrect Proof: <erroneous proof>
|
||||
State: <goal state before error from InfoView>
|
||||
Line at error: <error-occurred line of code>
|
||||
Lean error: <error messages from InfoView>
|
||||
```
|
||||
|
||||
**Assistant** (model output):
|
||||
```
|
||||
Explanation: <explanation of error cause>
|
||||
Fix: <code manipulation fix suggestion>
|
||||
Corrected Proof: <corrected proof>
|
||||
```
|
||||
|
||||
### Example Inference
|
||||
|
||||
````python
|
||||
import re
|
||||
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
import torch
|
||||
|
||||
torch.manual_seed(42)
|
||||
|
||||
|
||||
def extract_proof_from_text(output):
|
||||
lean_codes = re.findall(r"```lean\s*(.*?)\s*```", output, re.DOTALL)
|
||||
if not lean_codes or len(lean_codes) == 0:
|
||||
lean_codes = re.findall(r"```lean4\s*(.*?)\s*```", output, re.DOTALL)
|
||||
words = ["by", ":="]
|
||||
|
||||
for i in range(len(lean_codes)):
|
||||
lean_code = lean_codes[-i - 1]
|
||||
if all(word in lean_code for word in words):
|
||||
return lean_code
|
||||
return None
|
||||
|
||||
|
||||
model_id = "uw-math-ai/gAPRIL-wo-exp"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True
|
||||
)
|
||||
|
||||
system_prompt = (
|
||||
"You are a Lean 4 programmer diagnosing a single failing proof. "
|
||||
"Assume you only see the incorrect proof text, the infoview state"
|
||||
" near the failure, and Lean's error message."
|
||||
)
|
||||
|
||||
# Context information for the incorrect proof
|
||||
incorrect_proof = """
|
||||
theorem lean_problem : IsLeast {x : ℕ | x > 0 ∧ (7 * x) % 100 = 29} 47 := by
|
||||
constructor
|
||||
· constructor
|
||||
· norm_num
|
||||
· norm_num
|
||||
· intro x ⟨hx_pos, hx_cong⟩
|
||||
by_contra h
|
||||
push_neg at h
|
||||
obtain ⟨h_le, h_ne⟩ := lt_iff_le_and_ne.mp h
|
||||
have h_lt := h_le
|
||||
revert x hx_pos hx_cong h_lt
|
||||
refine' Nat.le_induction _ _ 47 _ <;> intros x hx_lt hx_pos hx_cong
|
||||
· rfl
|
||||
· have : x < 47 := by omega
|
||||
interval_cases x
|
||||
all_goals try { norm_num at hx_cong; norm_num }
|
||||
""".strip()
|
||||
|
||||
infoview_state = (
|
||||
"case right.intro.refine'_4 ⊢ ∀ (n : ℕ), sorry ≤ n → "
|
||||
"(∀ ⦃x : ℕ⦄, x > 0 → 7 * x % 100 = 29 → x < n → x ≤ n → x ≠ n → x ≤ n → False) → "
|
||||
"∀ ⦃x : ℕ⦄, x > 0 → 7 * x % 100 = 29 → x < n + 1 → x ≤ n + 1 → x ≠ n + 1 → x ≤ n + 1 → False"
|
||||
)
|
||||
|
||||
line_at_error = "refine' Nat.le_induction _ _ 47 _ <;> intros x hx_lt hx_pos hx_cong"
|
||||
|
||||
error_message = (
|
||||
"tactic 'introN' failed, insufficient number of binders\n"
|
||||
"case right.intro.refine'_1\n⊢ ℕ"
|
||||
)
|
||||
|
||||
user_prompt = f"""
|
||||
**Instruction:** Provide the full corrected Lean 4 theorem/proof in a single ```lean``` code block.
|
||||
|
||||
**Context:**
|
||||
|
||||
Incorrect proof:
|
||||
```lean
|
||||
{incorrect_proof}
|
||||
```
|
||||
|
||||
Infoview state:
|
||||
{infoview_state}
|
||||
|
||||
Line at error:
|
||||
{line_at_error}
|
||||
|
||||
Lean error:
|
||||
{error_message}
|
||||
""".strip()
|
||||
|
||||
|
||||
chat = [
|
||||
{"role": "system", "content": system_prompt},
|
||||
{"role": "user", "content": user_prompt},
|
||||
]
|
||||
|
||||
inputs = tokenizer.apply_chat_template(
|
||||
chat, tokenize=True, add_generation_prompt=True, return_tensors="pt"
|
||||
).to(model.device)
|
||||
|
||||
outputs = model.generate(inputs, max_new_tokens=8192)
|
||||
decoded_outputs = tokenizer.batch_decode(outputs)
|
||||
proof = extract_proof_from_text(decoded_outputs[0])
|
||||
print(proof)
|
||||
````
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@article{wang2026repairlean,
|
||||
title = {Learning to Repair Lean Proofs from Compiler Feedback},
|
||||
author = {Wang, Evan and Chess, Simon and Lee, Daniel and Ge, Siyuan and Mallavarapu, Ajit and Ilin, Vasily},
|
||||
journal= {arXiv preprint arXiv:2602.02990},
|
||||
year = {2026},
|
||||
doi = {10.48550/arXiv.2602.02990},
|
||||
url = {https://arxiv.org/abs/2602.02990}
|
||||
}
|
||||
```
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
Work by the Math AI Lab, University of Washington. Supported by UW eScience School, UW IT (AWS credits), UW Department of Applied Mathematics (GPU access), and Nebius (LLM cloud credits).
|
||||
Reference in New Issue
Block a user