初始化项目，由ModelHub XC社区提供模型

Model: athanor-ai/DeepSeek-Prover-V2-7B-GPTQ-4bit Source: Original Platform
2026-05-25 22:46:23 +08:00
commit 3704730749
8 changed files with 499704 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,72 @@
+---
+license: other
+license_name: deepseek-license
+license_link: https://github.com/deepseek-ai/DeepSeek-Prover-V2/blob/main/LICENSE-MODEL
+base_model: deepseek-ai/DeepSeek-Prover-V2-7B
+tags:
+  - lean4
+  - theorem-proving
+  - gptq
+  - quantization
+  - int4
+library_name: transformers
+pipeline_tag: text-generation
+---
+
+# DeepSeek-Prover-V2-7B - GPTQ 4-bit
+
+Community GPTQ int4 quantization of `deepseek-ai/DeepSeek-Prover-V2-7B` for Lean-4 theorem proving on consumer GPUs (tested on RTX 3080, 10 GB VRAM).
+
+**Weights**: 4.6 GB (vs 14 GB FP16) - fits inference on 8-10 GB cards with room for generation context.
+
+## Quantization config
+
+- Bits: 4
+- Group size: 128
+- Symmetry: asymmetric (`sym=False`)
+- Order: true-sequential
+- Damp: 0.01
+- Calibration: pileval, 128 samples x 512 tokens
+- Tool: auto-gptq 0.7.1
+
+## Why GPTQ instead of AWQ on this model
+
+AutoAWQ (0.2.9) crashed at layer 17 of 30 on this card with `NotImplementedError: Cannot copy out of meta tensor` - the per-layer scale search collides with `accelerate`'s CPU-offload hooks on sub-VRAM quantizes. `auto-gptq`'s per-layer lift-quantize-drop pattern avoids that path. Details in the recipe blog post (link TBD once published).
+
+## Load example
+
+```python
+from auto_gptq import AutoGPTQForCausalLM
+from transformers import AutoTokenizer
+
+model_id = "athanor-ai/DeepSeek-Prover-V2-7B-GPTQ-4bit"  # or the fallback user namespace
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoGPTQForCausalLM.from_quantized(
+    model_id, device="cuda:0", use_safetensors=True, trust_remote_code=False,
+)
+```
+
+Observed greedy throughput on RTX 3080 (triton kernels, no compiled CUDA ext): ~3.9 tok/s at 32 new tokens.
+
+## Known caveats
+
+- `auto-gptq 0.7.1` requires `peft<0.11` - newer peft removed `PEFT_TYPE_TO_MODEL_MAPPING` which breaks the import.
+- `rope_scaling` warnings on load are cosmetic (HF transformers 4.51.3 nits), safe to ignore.
+- Capability-preservation vs FP16 is workload-dependent. Benchmark your own suite before deploying. Calibration was pileval (generic); a Lean-specific calibration corpus may improve close-rate for theorem proving.
+
+## Dependencies used to produce this upload
+
+- torch==2.6.0+cu124
+- transformers==4.51.3
+- auto-gptq==0.7.1
+- peft<0.11
+- accelerate==1.13.0
+- safetensors==0.7.0
+
+## License + credit
+
+Inherits from upstream DeepSeek-Prover-V2 Model License (DeepSeek License Agreement v1.0) - see `license_link` in the card metadata. Original model and training: DeepSeek. This quantization: athanor-ai.
+
+## Companion benchmark
+
+Evaluated against [athanor-ai/formal-anytime-valid-stats-48](https://huggingface.co/datasets/athanor-ai/formal-anytime-valid-stats-48) — a 48-theorem Lean-4 benchmark of anytime-valid confidence-sequence lemmas. See the dataset README for per-theorem close rates of this model alongside Claude Opus 4.6, Sonnet 4.6, Kimi K2.5, and Gemini 3 Pro.