初始化项目，由ModelHub XC社区提供模型

Model: Vaxispraxis/Llama-3.1-8B-Instruct-heretic Source: Original Platform
2026-04-10 15:37:06 +08:00
commit 3860ae3c8a
13 changed files with 2707 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,117 @@
+---
+license: llama3.1
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- llama
+- llama-3
+- heretic
+- abliterated
+- uncensored
+- decensored
+- conversational
+- alignment
+base_model: meta-llama/Llama-3.1-8B-Instruct
+---
+
+# 🧠 Llama-3.1-8B-Instruct-Heretic
+
+A behavior-modified version of Llama 3.1 8B Instruct created using the Heretic framework for residual-based abliteration.
+
+---
+
+## 🚀 Overview
+
+This model applies **post-training behavioral modification** to reduce refusal responses while preserving core model capabilities.
+
+Instead of fine-tuning, it uses:
+
+- Residual stream manipulation  
+- Directional vector subtraction (abliteration)  
+- KL-divergence constrained optimization  
+
+---
+
+## ⚙️ Methodology
+
+The model was processed using **Heretic** with the following approach:
+
+1. Collect residual activations from prompts
+2. Identify directional differences between:
+   - compliant outputs  
+   - refusal outputs  
+3. Subtract refusal-associated components from model behavior
+4. Optimize via trial-based search with KL constraints
+
+---
+
+## 🧪 Training Configuration
+
+Key parameters:
+
+- Trials: **200**
+- Startup trials: **60**
+- KL divergence target: **0.01**
+- Batch size: **8 (auto)**
+- Max response length: **100 tokens**
+- Quantization: **none**
+- Device map: **auto**
+
+---
+
+## 📊 Datasets
+
+### Training
+
+- `mlabonne/harmless_alpaca` (non-refusal baseline)
+- `mlabonne/harmful_behaviors` (refusal-inducing prompts)
+
+### Evaluation
+
+- Same datasets using test splits
+
+---
+
+## 🧠 Behavioral Characteristics
+
+Compared to the base model:
+
+### Changes
+
+- Reduced refusal frequency  
+- More permissive responses  
+- Increased directness  
+
+### Trade-offs
+
+- Potential increase in unsafe or unfiltered outputs  
+- Reduced alignment safeguards  
+- Behavior depends strongly on prompt phrasing  
+
+---
+
+## ⚠️ Limitations
+
+- Refusal detection is heuristic (string-based)
+- No semantic safety guarantees
+- No quantization (higher VRAM usage)
+- No row normalization applied
+
+---
+
+## 📦 Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_id = "Vaxispraxis/Llama-3.1-8B-Instruct-heretic"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+
+prompt = "Explain how neural networks work"
+inputs = tokenizer(prompt, return_tensors="pt")
+
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0]))