初始化项目,由ModelHub XC社区提供模型
Model: athanor-ai/DeepSeek-Prover-V2-7B-GPTQ-4bit Source: Original Platform
This commit is contained in:
72
README.md
Normal file
72
README.md
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
license: other
|
||||
license_name: deepseek-license
|
||||
license_link: https://github.com/deepseek-ai/DeepSeek-Prover-V2/blob/main/LICENSE-MODEL
|
||||
base_model: deepseek-ai/DeepSeek-Prover-V2-7B
|
||||
tags:
|
||||
- lean4
|
||||
- theorem-proving
|
||||
- gptq
|
||||
- quantization
|
||||
- int4
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# DeepSeek-Prover-V2-7B - GPTQ 4-bit
|
||||
|
||||
Community GPTQ int4 quantization of `deepseek-ai/DeepSeek-Prover-V2-7B` for Lean-4 theorem proving on consumer GPUs (tested on RTX 3080, 10 GB VRAM).
|
||||
|
||||
**Weights**: 4.6 GB (vs 14 GB FP16) - fits inference on 8-10 GB cards with room for generation context.
|
||||
|
||||
## Quantization config
|
||||
|
||||
- Bits: 4
|
||||
- Group size: 128
|
||||
- Symmetry: asymmetric (`sym=False`)
|
||||
- Order: true-sequential
|
||||
- Damp: 0.01
|
||||
- Calibration: pileval, 128 samples x 512 tokens
|
||||
- Tool: auto-gptq 0.7.1
|
||||
|
||||
## Why GPTQ instead of AWQ on this model
|
||||
|
||||
AutoAWQ (0.2.9) crashed at layer 17 of 30 on this card with `NotImplementedError: Cannot copy out of meta tensor` - the per-layer scale search collides with `accelerate`'s CPU-offload hooks on sub-VRAM quantizes. `auto-gptq`'s per-layer lift-quantize-drop pattern avoids that path. Details in the recipe blog post (link TBD once published).
|
||||
|
||||
## Load example
|
||||
|
||||
```python
|
||||
from auto_gptq import AutoGPTQForCausalLM
|
||||
from transformers import AutoTokenizer
|
||||
|
||||
model_id = "athanor-ai/DeepSeek-Prover-V2-7B-GPTQ-4bit" # or the fallback user namespace
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoGPTQForCausalLM.from_quantized(
|
||||
model_id, device="cuda:0", use_safetensors=True, trust_remote_code=False,
|
||||
)
|
||||
```
|
||||
|
||||
Observed greedy throughput on RTX 3080 (triton kernels, no compiled CUDA ext): ~3.9 tok/s at 32 new tokens.
|
||||
|
||||
## Known caveats
|
||||
|
||||
- `auto-gptq 0.7.1` requires `peft<0.11` - newer peft removed `PEFT_TYPE_TO_MODEL_MAPPING` which breaks the import.
|
||||
- `rope_scaling` warnings on load are cosmetic (HF transformers 4.51.3 nits), safe to ignore.
|
||||
- Capability-preservation vs FP16 is workload-dependent. Benchmark your own suite before deploying. Calibration was pileval (generic); a Lean-specific calibration corpus may improve close-rate for theorem proving.
|
||||
|
||||
## Dependencies used to produce this upload
|
||||
|
||||
- torch==2.6.0+cu124
|
||||
- transformers==4.51.3
|
||||
- auto-gptq==0.7.1
|
||||
- peft<0.11
|
||||
- accelerate==1.13.0
|
||||
- safetensors==0.7.0
|
||||
|
||||
## License + credit
|
||||
|
||||
Inherits from upstream DeepSeek-Prover-V2 Model License (DeepSeek License Agreement v1.0) - see `license_link` in the card metadata. Original model and training: DeepSeek. This quantization: athanor-ai.
|
||||
|
||||
## Companion benchmark
|
||||
|
||||
Evaluated against [athanor-ai/formal-anytime-valid-stats-48](https://huggingface.co/datasets/athanor-ai/formal-anytime-valid-stats-48) — a 48-theorem Lean-4 benchmark of anytime-valid confidence-sequence lemmas. See the dataset README for per-theorem close rates of this model alongside Claude Opus 4.6, Sonnet 4.6, Kimi K2.5, and Gemini 3 Pro.
|
||||
Reference in New Issue
Block a user