Vaxispraxis/Llama-3.1-8B-Instruct-heretic

Go to file

ModelHub XC 3860ae3c8a 初始化项目，由ModelHub XC社区提供模型

Model: Vaxispraxis/Llama-3.1-8B-Instruct-heretic
Source: Original Platform

2026-04-10 15:37:06 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 15:37:06 +08:00

README.md

license, language, pipeline_tag, tags, base_model

license

language

pipeline_tag

🧠 Llama-3.1-8B-Instruct-Heretic

A behavior-modified version of Llama 3.1 8B Instruct created using the Heretic framework for residual-based abliteration.

🚀 Overview

This model applies post-training behavioral modification to reduce refusal responses while preserving core model capabilities.

Instead of fine-tuning, it uses:

Residual stream manipulation
Directional vector subtraction (abliteration)
KL-divergence constrained optimization

⚙️ Methodology

The model was processed using Heretic with the following approach:

Collect residual activations from prompts
Identify directional differences between:
- compliant outputs
- refusal outputs
Subtract refusal-associated components from model behavior
Optimize via trial-based search with KL constraints

🧪 Training Configuration

Key parameters:

Trials: 200
Startup trials: 60
KL divergence target: 0.01
Batch size: 8 (auto)
Max response length: 100 tokens
Quantization: none
Device map: auto

📊 Datasets

Training

mlabonne/harmless_alpaca (non-refusal baseline)
mlabonne/harmful_behaviors (refusal-inducing prompts)

Evaluation

Same datasets using test splits

🧠 Behavioral Characteristics

Compared to the base model:

Changes

Reduced refusal frequency
More permissive responses
Increased directness

Trade-offs

Potential increase in unsafe or unfiltered outputs
Reduced alignment safeguards
Behavior depends strongly on prompt phrasing

⚠️ Limitations

Refusal detection is heuristic (string-based)
No semantic safety guarantees
No quantization (higher VRAM usage)
No row normalization applied

📦 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Vaxispraxis/Llama-3.1-8B-Instruct-heretic"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "Explain how neural networks work"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))