gAPRIL-wo-exp/README.md

---
license: apache-2.0
datasets:
- uw-math-ai/APRIL
tags:
  - lean4
base_model:
- Goedel-LM/Goedel-Prover-V2-8B
pipeline_tag: text-generation
library_name: transformers
---

# APRIL-Goedel-8B: Lean Proof Repair (Repair Only)

This model is a LoRA finetune of [Goedel-Prover-V2-8B](https://huggingface.co/Goedel-LM/Goedel-Prover-V2-8B) on the [APRIL](https://huggingface.co/datasets/uw-math-ai/APRIL) dataset for **Lean 4 proof repair without explanation supervision**. Given an erroneous Lean proof and compiler feedback, the model directly produces a corrected proof. This variant maximizes single-shot repair accuracy by training exclusively on the repair objective.

## Model Details

- **Base model:** Goedel-Prover-V2-8B
- **Method:** Supervised finetuning with LoRA (rank 32, α = 64)
- **Training data:** APRIL — 260K paired erroneous/correct Lean proofs with compiler diagnostics (explanations excluded from supervision)
- **Task:** Proof repair only
- **Lean version:** 4.22.0-rc4

## Results

Single-shot proof repair accuracy (pass@1) on the APRIL test set (1,835 examples), evaluated by Lean compilation:

| Error Type | This Model (w/o exp) | With Explanations | Base Goedel-8B | Goedel-32B |
|---|---|---|---|---|
| **Full** | **36.7%** | 34.6% | 15.5% | 26.8% |
| Tactic | **48.5%** | 41.7% | 19.6% | 34.2% |
| Line | **25.5%** | 18.5% | 20.0% | 28.5% |
| Theorem | **37.5%** | 36.8% | 12.7% | 23.0% |
| Multi-Line | **24.3%** | 20.8% | 19.4% | 32.6% |

Training exclusively for repair (without explanation supervision) yields the highest pass@1 accuracy, gaining ~2% over the joint variant. However, this model does not produce human-interpretable diagnostics. See the [with-explanation variant](https://huggingface.co/uw-math-ai/gAPRIL-w-exp) for the trade-off discussion.

## Model & Dataset Download

| Resource | Description | Link |
|---|---|---|
| **APRIL Dataset** | 260K Lean proof-repair tuples with compiler diagnostics and explanations | [uw-math-ai/APRIL](https://huggingface.co/datasets/uw-math-ai/APRIL) |
| **gAPRIL-w-exp** | Goedel-8B finetuned on APRIL with joint explanation supervision | [uw-math-ai/gAPRIL-w-exp](https://huggingface.co/uw-math-ai/gAPRIL-w-exp) |
| **gAPRIL-wo-exp** | Goedel-8B finetuned on APRIL for repair only (no explanations) | [uw-math-ai/gAPRIL-wo-exp](https://huggingface.co/uw-math-ai/gAPRIL-wo-exp) |

## Usage

The model expects a chat-formatted prompt with the erroneous proof, goal state, error line, and compiler error message. The assistant response contains the corrected proof in a `lean` code block.

**System:** `You are diagnosing a single failing proof`

**User:**
```
Explain the error, suggest a fix, and provide the corrected proof based on the context:

Incorrect Proof: <erroneous proof>
State: <goal state before error from InfoView>
Line at error: <error-occurred line of code>
Lean error: <error messages from InfoView>
```

**Assistant** (model output):
```
Explanation: <explanation of error cause>
Fix: <code manipulation fix suggestion>
Corrected Proof: <corrected proof>
```

### Example Inference

````python
import re

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

torch.manual_seed(42)


def extract_proof_from_text(output):
    lean_codes = re.findall(r"```lean\s*(.*?)\s*```", output, re.DOTALL)
    if not lean_codes or len(lean_codes) == 0:
        lean_codes = re.findall(r"```lean4\s*(.*?)\s*```", output, re.DOTALL)
    words = ["by", ":="]

    for i in range(len(lean_codes)):
        lean_code = lean_codes[-i - 1]
        if all(word in lean_code for word in words):
            return lean_code
    return None


model_id = "uw-math-ai/gAPRIL-wo-exp"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True
)

system_prompt = (
    "You are a Lean 4 programmer diagnosing a single failing proof. "
    "Assume you only see the incorrect proof text, the infoview state"
    " near the failure, and Lean's error message."
)

# Context information for the incorrect proof
incorrect_proof = """
theorem lean_problem : IsLeast {x : ℕ | x > 0 ∧ (7 * x) % 100 = 29} 47 := by
  constructor
  · constructor
    · norm_num
    · norm_num
  · intro x ⟨hx_pos, hx_cong⟩
    by_contra h
    push_neg at h
    obtain ⟨h_le, h_ne⟩ := lt_iff_le_and_ne.mp h
    have h_lt := h_le
    revert x hx_pos hx_cong h_lt
    refine' Nat.le_induction _ _ 47 _ <;> intros x hx_lt hx_pos hx_cong
    · rfl
    · have : x < 47 := by omega
      interval_cases x
      all_goals try { norm_num at hx_cong; norm_num }
""".strip()

infoview_state = (
    "case right.intro.refine'_4 ⊢ ∀ (n : ℕ), sorry ≤ n → "
    "(∀ ⦃x : ℕ⦄, x > 0 → 7 * x % 100 = 29 → x < n → x ≤ n → x ≠ n → x ≤ n → False) → "
    "∀ ⦃x : ℕ⦄, x > 0 → 7 * x % 100 = 29 → x < n + 1 → x ≤ n + 1 → x ≠ n + 1 → x ≤ n + 1 → False"
)

line_at_error = "refine' Nat.le_induction _ _ 47 _ <;> intros x hx_lt hx_pos hx_cong"

error_message = (
    "tactic 'introN' failed, insufficient number of binders\n"
    "case right.intro.refine'_1\n⊢ ℕ"
)

user_prompt = f"""
**Instruction:** Provide the full corrected Lean 4 theorem/proof in a single ```lean``` code block.

**Context:**

Incorrect proof:
```lean
{incorrect_proof}
```

Infoview state:
{infoview_state}

Line at error:
{line_at_error}

Lean error:
{error_message}
""".strip()


chat = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]

inputs = tokenizer.apply_chat_template(
    chat, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=8192)
decoded_outputs = tokenizer.batch_decode(outputs)
proof = extract_proof_from_text(decoded_outputs[0])
print(proof)
````

## Citation

```bibtex
@article{wang2026repairlean,
  title  = {Learning to Repair Lean Proofs from Compiler Feedback},
  author = {Wang, Evan and Chess, Simon and Lee, Daniel and Ge, Siyuan and Mallavarapu, Ajit and Ilin, Vasily},
  journal= {arXiv preprint arXiv:2602.02990},
  year   = {2026},
  doi    = {10.48550/arXiv.2602.02990},
  url    = {https://arxiv.org/abs/2602.02990}
}
```

## Acknowledgements

Work by the Math AI Lab, University of Washington. Supported by UW eScience School, UW IT (AWS credits), UW Department of Applied Mathematics (GPU access), and Nebius (LLM cloud credits).